Home ยป Serious AI Bugs Expose Meta, Nvidia and Microsoft Inference

Serious AI Bugs Expose Meta, Nvidia and Microsoft Inference

AI inference vulnerabilities in Meta, Nvidia, Microsoft and vLLM exposed through ShadowMQ, alongside a Cursor IDE compromise via rogue MCP servers ShadowMQ deserialization flaws and Cursor MCP attacks highlight how insecure AI frameworks and developer tools can lead to full remote code execution

Modern AI stacks shipped a cluster of bugs that look almost custom-built for attackers. Critical remote code execution flaws in inference frameworks from ๐— ๐—ฒ๐˜๐—ฎ, ๐—ก๐—ฉ๐—œ๐——๐—œ๐—”, ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜, and open-source projects like ๐˜ƒ๐—Ÿ๐—Ÿ๐— , ๐—ฆ๐—š๐—Ÿ๐—ฎ๐—ป๐—ด and ๐— ๐—ผ๐—ฑ๐˜‚๐—น๐—ฎ๐—ฟ ๐— ๐—ฎ๐˜… ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ all trace back to the same design mistake: unsafe ZeroMQ messaging combined with Python ๐—ฝ๐—ถ๐—ฐ๐—ธ๐—น๐—ฒ deserialization. At the same time, research into ๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟโ€™๐˜€ AI-powered IDE shows that attackers can hijack its built-in browser and turn a developer workstation into a fully privileged malware delivery platform.

Together, these issues show how quickly insecure patterns can propagate across AI infrastructure and development tools when teams copy architecture and code without hardening it. For defenders, they are a concrete reminder that โ€œAI securityโ€ is not theoretical anymore; it is a classic RCE and supply-chain problem wearing a new label.

๐—ฆ๐—ต๐—ฎ๐—ฑ๐—ผ๐˜„๐— ๐—ค: ๐˜‚๐—ป๐˜€๐—ฎ๐—ณ๐—ฒ ๐—ญ๐—ฒ๐—ฟ๐—ผ๐— ๐—ค + ๐—ฝ๐—ถ๐—ฐ๐—ธ๐—น๐—ฒ ๐—ฎ๐˜€ ๐—ฎ ๐—ฝ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐—ป, ๐—ป๐—ผ๐˜ ๐—ฎ ๐˜€๐—ถ๐—ป๐—ด๐—น๐—ฒ ๐—ฏ๐˜‚๐—ด

The first piece of the picture sits inside Metaโ€™s ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ. In affected versions, the framework exposed a ZeroMQ socket over the network and then used ๐—ฟ๐—ฒ๐—ฐ๐˜ƒ_๐—ฝ๐˜†๐—ผ๐—ฏ๐—ท() to deserialize incoming traffic via Pythonโ€™s ๐—ฝ๐—ถ๐—ฐ๐—ธ๐—น๐—ฒ module. In other words, any endpoint that could talk to that socket could send a crafted object and gain ๐—ฟ๐—ฒ๐—บ๐—ผ๐˜๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ ๐—ฒ๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป on the inference node. That flaw shipped as ๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฐ-๐Ÿฑ๐Ÿฌ๐Ÿฌ๐Ÿฑ๐Ÿฌ and was later patched by switching to safer JSON-based serialization.

However, the more serious story is what happened next. The same unsafe pattern ZeroMQ sockets exposed over TCP, unauthenticated endpoints, and direct pickle deserialization showed up across multiple AI inference frameworks maintained by different vendors and communities. Oligoโ€™s research team labeled this propagation pattern ๐—ฆ๐—ต๐—ฎ๐—ฑ๐—ผ๐˜„๐— ๐—ค: an architecture bug that quietly spread through the ecosystem as teams copied and adapted reference code.ย 

As the analysis expanded, researchers documented near-identical logic in:

  • ๐˜ƒ๐—Ÿ๐—Ÿ๐—  (๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ-๐Ÿฏ๐Ÿฌ๐Ÿญ๐Ÿฒ๐Ÿฑ)

  • ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—ง๐—ฒ๐—ป๐˜€๐—ผ๐—ฟ๐—ฅ๐—ง-๐—Ÿ๐—Ÿ๐—  (๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ-๐Ÿฎ๐Ÿฏ๐Ÿฎ๐Ÿฑ๐Ÿฐ)

  • ๐— ๐—ผ๐—ฑ๐˜‚๐—น๐—ฎ๐—ฟ ๐— ๐—ฎ๐˜… ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ (๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ-๐Ÿฒ๐Ÿฌ๐Ÿฐ๐Ÿฑ๐Ÿฑ)

  • ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜โ€™๐˜€ ๐—ฆ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ต๐—ถ-๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ and the ๐—ฆ๐—š๐—Ÿ๐—ฎ๐—ป๐—ด project, with partial or incomplete fixes at disclosure time.ย 

From a defensive perspective, ShadowMQ matters less as โ€œjust a bugโ€ and more as a supply-chain symptom. When core AI frameworks publish vulnerable patterns in their reference designs, that code doesnโ€™t stay local; it shows up in forks, wrappers, and vendor products that trust the original implementation. That is exactly what we see here.

๐—›๐—ผ๐˜„ ๐˜๐—ต๐—ฒ๐˜€๐—ฒ ๐—”๐—œ ๐—ฏ๐˜‚๐—ด๐˜€ ๐˜๐˜‚๐—ฟ๐—ป ๐—ถ๐—ป๐˜๐—ผ ๐—ฟ๐—ฒ๐—ฎ๐—น ๐—ฎ๐˜๐˜๐—ฎ๐—ฐ๐—ธ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป๐˜€

In practice, exploitation still requires an attacker to reach the ZeroMQ endpoints. However, many AI deployments run inference servers inside flat internal networks, on shared Kubernetes clusters, or behind load balancers that developers treat as โ€œtrusted.โ€ Once an attacker lands anywhere in that environment via compromised credentials, a cloud misconfiguration, or a separate application bugโ€”they can:

  • Connect to the exposed ZMQ socket.

  • Ship a malicious pickle object that abuses the unsafe ๐—ฟ๐—ฒ๐—ฐ๐˜ƒ_๐—ฝ๐˜†๐—ผ๐—ฏ๐—ท() logic.

  • Gain code execution on the inference node and pivot across the AI cluster.

At that point, the difference between โ€œAI bugโ€ and โ€œstandard infrastructure compromiseโ€ disappears. Attackers can:

  • Steal model weights and training data.

  • Tamper with responses and silently poison downstream applications.

  • Deploy additional malware such as crypto-miners or lateral-movement tooling.ย 

Because these inference engines often sit at the center of latency-sensitive workloads, defenders rarely instrument them as heavily as traditional front-end services. As a result, a ShadowMQ exploit can look like โ€œjust another internal service talking to the model,โ€ while in reality it runs arbitrary Python code inside the cluster.

๐—˜๐—ฐ๐—ต๐—ผ๐—ฒ๐˜€ ๐—ผ๐—ณ ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎโ€™๐˜€ ๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฐ-๐Ÿฑ๐Ÿฌ๐Ÿฌ๐Ÿฑ๐Ÿฌ ๐—ถ๐—ป ๐˜ƒ๐—Ÿ๐—Ÿ๐— , ๐—ง๐—ฒ๐—ป๐˜€๐—ผ๐—ฟ๐—ฅ๐—ง-๐—Ÿ๐—Ÿ๐— , ๐—ฎ๐—ป๐—ฑ ๐—ผ๐˜๐—ต๐—ฒ๐—ฟ ๐—ณ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€

Metaโ€™s Llama-Stack vulnerability (๐—–๐—ฉ๐—˜-๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฐ-๐Ÿฑ๐Ÿฌ๐Ÿฌ๐Ÿฑ๐Ÿฌ) was the first widely documented example. The fix replaced pickle with JSON and tightened the messaging interface.ย 

Later, as researchers audited related projects, they found:

For defenders, this looks very similar to what we already know from classic OSS supply-chain incidents. Reference code that uses unsafe deserialization with ๐—ฝ๐—ถ๐—ฐ๐—ธ๐—น๐—ฒ and ๐—ญ๐—ฒ๐—ฟ๐—ผ๐— ๐—ค becomes a template. Even if each project tweaks names or control flow, the underlying risk remains identical.

๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟโ€™๐˜€ ๐—ฏ๐—ฟ๐—ผ๐˜„๐˜€๐—ฒ๐—ฟ ๐—ฎ๐—ป๐—ฑ ๐—ฅ๐—ข๐—š๐—จ๐—˜ ๐— ๐—–๐—ฃ ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐˜€: ๐˜๐˜‚๐—ฟ๐—ป๐—ถ๐—ป๐—ด ๐—”๐—œ ๐—œ๐——๐—˜๐˜€ ๐—ถ๐—ป๐˜๐—ผ ๐—บ๐—ฎ๐—น๐˜„๐—ฎ๐—ฟ๐—ฒ ๐—ฑ๐—ฒ๐—น๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐˜† ๐—ฝ๐—น๐—ฎ๐˜๐—ณ๐—ผ๐—ฟ๐—บ๐˜€

The second half of the research focuses on ๐—–๐˜‚๐—ฟ๐˜€๐—ผ๐—ฟ, an AI-powered code editor built on top of the VS Code / Electron stack. Its new built-in browser and Model Context Protocol (๐— ๐—–๐—ฃ) support give AI agents more reach into web apps and internal tools. Unfortunately, they also give attackers a much larger attack surface.ย 

Knosticโ€™s analysis shows that a malicious local MCP server can:

  • Register itself via a benign-looking ๐—บ๐—ฐ๐—ฝ.๐—ท๐˜€๐—ผ๐—ป config.

  • Inject JavaScript into Cursorโ€™s internal browser at runtime.

  • Replace real login pages with phishing pages that steal credentials and send them to an attacker-controlled endpoint.ย 

Because Cursor inherits Node.js and Electronโ€™s privileges, malicious JavaScript running in that context can access local files, modify extensions, and persist changes that survive restarts. Previous research into Cursor and VS Code already demonstrated similar risks under names like โ€œCurXecuteโ€ and โ€œMCPoison,โ€ where flawed MCP handling enabled remote or arbitrary code execution against developer machines.ย 

In other words, once an attacker convinces a developer to enable a rogue MCP server or install a compromised extension, the IDE effectively becomes an ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—บ๐—ฎ๐—น๐˜„๐—ฎ๐—ฟ๐—ฒ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜. It can read SSH keys, manipulate repos, poison CI pipelines, and push backdoored code into production.

๐—ช๐—ต๐˜† ๐˜๐—ต๐—ฒ๐˜€๐—ฒ ๐—ฏ๐˜‚๐—ด๐˜€ ๐˜€๐—ต๐—ผ๐˜„ ๐—ฎ ๐—ฏ๐—ฟ๐—ผ๐—ฎ๐—ฑ๐—ฒ๐—ฟ ๐—”๐—œ ๐˜€๐˜‚๐—ฝ๐—ฝ๐—น๐˜†-๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ

From a seasoned defenderโ€™s perspective, none of this is really โ€œnew.โ€ We have seen unsafe deserialization, over-privileged services, and IDE extension abuse for years. What changed is the velocity and blast radius. AI frameworks and MCP servers are:

  • widely reused as reference implementations,

  • packaged into turnkey stacks and SaaS offerings, and

  • deployed deep inside both infrastructure and developer workflows.ย 

As a result, a single design mistake like treating ZeroMQ + pickle as โ€œgood enoughโ€ for internal messaging, or trusting MCP servers by default can propagate into hundreds or thousands of downstream environments. When those components sit next to GPUs, production models, or CI/CD secrets, the risk profile escalates quickly.

๐—ฃ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฑ๐—ฒ๐—ณ๐—ฒ๐—ป๐˜€๐—ฒ: ๐˜„๐—ต๐—ฎ๐˜ ๐˜๐—ผ ๐—ฐ๐—ต๐—ฎ๐—ป๐—ด๐—ฒ ๐—ถ๐—ป ๐—”๐—œ ๐—ถ๐—ป๐—ณ๐—ฟ๐—ฎ ๐—ฎ๐—ป๐—ฑ ๐—ฑ๐—ฒ๐˜ƒ ๐˜๐—ผ๐—ผ๐—น๐—ถ๐—ป๐—ด

For AI platform teams, a few moves now pay off later:

First, treat ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐˜€ ๐—ฎ๐˜€ ๐˜๐—ต๐—ฟ๐—ฒ๐—ฎ๐˜ ๐—ฏ๐—ผ๐˜‚๐—ป๐—ฑ๐—ฎ๐—ฟ๐—ถ๐—ฒ๐˜€, not internal plumbing. That means blocking direct network exposure for ZeroMQ and similar protocols, using mTLS or authenticated channels where possible, and isolating inference nodes into dedicated segments with tight egress controls.

Second, aggressively ๐—ฒ๐—น๐—ถ๐—บ๐—ถ๐—ป๐—ฎ๐˜๐—ฒ ๐˜‚๐—ป๐˜€๐—ฎ๐—ณ๐—ฒ ๐—ฑ๐—ฒ๐˜€๐—ฒ๐—ฟ๐—ถ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป. If any component uses Pythonโ€™s ๐—ฝ๐—ถ๐—ฐ๐—ธ๐—น๐—ฒ, Javaโ€™s native serialization, or similar formats to deserialize untrusted or semi-trusted data from the network, that code should be treated as a vulnerability until proved otherwise. Safer formats (JSON, Protobuf, Capโ€™n Proto) and strict schema validation should be the baseline.

Third, extend ๐—ฟ๐—ฎ๐—ป๐˜€๐—ผ๐—ป๐˜„๐—ฎ๐—ฟ๐—ฒ-๐˜€๐˜๐˜†๐—น๐—ฒ ๐—ต๐—ฎ๐—ฟ๐—ฑ๐—ฒ๐—ป๐—ถ๐—ป๐—ด to AI clusters. That includes strict identity for machine-to-machine traffic, dedicated backup strategies for model artifacts and training data, and continuous inventory of where each inference framework version runs inside the estate.

On the developer side, organizations should treat AI-powered IDEs like any other high-value endpoint:

  • Disable or constrain ๐—”๐˜‚๐˜๐—ผ-๐—ฅ๐˜‚๐—ป behavior for MCP servers and extensions.

  • Maintain a curated allowlist of trusted MCP servers, with code review requirements and explicit ownership.

  • Monitor for new or unapproved MCP endpoints and browser components inside IDEs.

  • Educate developers that โ€œAI helper serversโ€ are ๐—ฟ๐—ฒ๐—บ๐—ผ๐˜๐—ฒ ๐—ฐ๐—ผ๐—ฑ๐—ฒ ๐—ฒ๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐˜ƒ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐˜€, not harmless sidecars.ย 

Finally, AI security needs to plug into existing threat-modeling and vulnerability-management programs, not operate as a separate track. ShadowMQ and the Cursor browser bugs are classic examples of why: they are not โ€œprompt injection problems,โ€ they are direct RCE and supply-chain issues that just happen to live inside AI-branded software.

๐—™๐—”๐—ค๐˜€

Q: What is ShadowMQ in the context of AI security?
A: ShadowMQ is a pattern where AI frameworks reuse ZeroMQ messaging code that deserializes network data with Pythonโ€™s pickle, creating repeatable RCE conditions across Meta Llama-Stack, vLLM, TensorRT-LLM, and related projects. The Hacker News+5Oligo Security+5NVD+5

Q: How serious are the CVEs tied to these AI inference engines?
A: CVE-2024-50050, CVE-2025-30165, and CVE-2025-23254 all enable code execution on inference servers under realistic conditions, which can lead to model theft, data exposure, and further compromise of connected systems. NVD+5NVD+5NVD+5

Q: Do these vulnerabilities require direct internet exposure of the AI service?
A: Not necessarily. Attackers can often reach ZeroMQ endpoints from inside the network after exploiting other services, misconfigurations, or stolen credentials, then use ShadowMQ-style flaws as a lateral-movement amplifier. Feedly+4Oligo Security+4NVD+4

Q: How can a rogue MCP server compromise a developer using Cursor?
A: A malicious MCP server can inject JavaScript into Cursorโ€™s built-in browser, replace login flows with phishing pages, and run with the IDEโ€™s privileges to read files, modify extensions, and persist malware inside the development environment. Cursor+4Knostic+4Knostic+4

Q: What are the most effective short-term mitigations?
A: Inference teams should disable unsafe serialization, patch to fixed versions, and isolate AI services. Developer teams should lock down MCP usage, vet extensions, and treat AI IDEs as privileged endpoints that require monitoring and hardening. Snyk+6Oligo Security+6wiz.io+6

One thought on “Serious AI Bugs Expose Meta, Nvidia and Microsoft Inference

Leave a Reply

Your email address will not be published. Required fields are marked *