Home » Ollama Under Fire: Code Execution in Popular LLM Framework

Ollama Under Fire: Code Execution in Popular LLM Framework

Ollama AI server parsing a malicious GGUF model file that leads to remote code execution vulnerability Critical parsing flaws in Ollama’s GGUF model handling allow attackers to execute arbitrary code by supplying malicious model files to vulnerable AI servers

Ollama has quickly become a go-to framework for running large language models locally, yet that also means any flaw in its core model-handling logic becomes extremely dangerous. Recently disclosed vulnerabilities in the way Ollama parses GGUF model files allow attackers to achieve arbitrary code execution on vulnerable servers simply by getting a malicious model processed. In other words, if an attacker can convince an Ollama instance to load a weaponized model, they can pivot from “just sending a file” to running their own code in the context of the server process.

Because teams often deploy Ollama in internal AI platforms, developer environments, or even exposed lab instances, this class of bug has direct implications for supply-chain security, model registries, and any environment where models are pulled from untrusted or semi-trusted sources. When you treat models as data but your inference engine treats them as code, parsing bugs become a straight path to compromise.

𝗪𝗵𝗮𝘁 𝗢𝗹𝗹𝗮𝗺𝗮 𝗗𝗼𝗲𝘀 𝗮𝗻𝗱 𝗪𝗵𝘆 𝗣𝗮𝗿𝘀𝗶𝗻𝗴 𝗜𝘀 𝗦𝗼 𝗖𝗿𝗶𝘁𝗶𝗰𝗮𝗹

Ollama packages models using the GGUF format and runs through a straightforward client–server architecture. The server operates locally or in the cloud and handles requests to pull, load, and run models. The client acts as the front-end that sends prompts and collects responses. To prepare any model for inference, Ollama parses its metadata, allocates memory structures, and hands control to the high-performance components that execute the model.

Because parsing occurs before anyone sees a token, it forms a critical trust boundary. When a system reads complex binary metadata from attacker-supplied input and uses those values to size vectors, fill structures, or select indices, it must assume that data is hostile. If the parser skips proper validation, a crafted model file can corrupt the heap, alter bits in sensitive structures, or redirect execution by modifying function pointers.

These Ollama vulnerabilities stem from the way the engine handles certain multimodal model variants. The C++ parsing logic pulls metadata fields straight from GGUF files and treats them as valid counts and indices without applying strict bounds. This design flaw gives a malicious model the ability to drive the parser into out-of-bounds writes and deliberate memory corruption.

𝗦𝗼𝗻𝗮𝗿𝗦𝗼𝘂𝗿𝗰𝗲 𝗙𝗶𝗻𝗱𝘀 𝗮𝗻 𝗢𝘂𝘁-𝗢𝗳-𝗕𝗼𝘂𝗻𝗱𝘀 𝗪𝗿𝗶𝘁𝗲 𝗶𝗻 𝗺𝗹𝗹𝗮𝗺𝗮 𝗣𝗮𝗿𝘀𝗶𝗻𝗴

During a targeted code audit of Ollama, SonarSource’s researchers focused on how the engine parses LLM model metadata. They initially noticed a risk pattern around untrusted strings copied into fixed-size buffers, then moved deeper into the multimodal “mllama” handling code, where the real issue surfaced.

For mllama models, Ollama reads metadata such as the number of layers and an array of “intermediate layer” indices. That metadata arrives directly from the GGUF file. The parser stores the declared number of layers, allocates a vector to track which ones are “intermediate,” and then iterates over the indices from the file to mark them. Critically, the code never verifies that each index in the metadata actually fits within the allocated vector’s bounds.

Because the implementation uses a space-optimized std::vector<bool> structure, an attacker can not only step beyond the end of the vector but also flip individual bits in adjacent heap structures. With careful model crafting, those bits can be aligned so they modify fields that matter, such as function pointers in nearby structs used later during inference. Ultimately, this gives an attacker a path to steer execution toward attacker-chosen code sequences and build a ROP chain to gain arbitrary code execution.

Importantly, the vulnerability affects Ollama releases prior to version 0.7.0. In those builds, the vulnerable mllama parsing logic lives in C++, and the exploited memory corruption happens inside the model initialization path. The exploit requires only the ability to load a specially crafted model via the Ollama API or any higher-level workflow that triggers model creation.

𝗔𝘁𝘁𝗮𝗰𝗸 𝗦𝘂𝗿𝗳𝗮𝗰𝗲: 𝗙𝗿𝗼𝗺 𝗠𝗮𝗹𝗶𝗰𝗶𝗼𝘂𝘀 𝗠𝗼𝗱𝗲𝗹 𝘁𝗼 𝗥𝗲𝗺𝗼𝘁𝗲 𝗖𝗼𝗱𝗲 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻

On paper, the vulnerability requires “access to Ollama’s API.” In practice, that access appears in several realistic scenarios:

• A self-hosted Ollama instance exposed directly or indirectly to the internet, often through rushed lab deployments or misconfigured reverse proxies.
• Internal developer clusters where engineers experiment with community models and “just pull” artifacts from public registries or third-party sources.
• Automated pipelines that fetch, convert, or remix GGUF models without strong provenance checks.

In each case, an attacker who can convince the server to parse a malicious model file can trigger the out-of-bounds write. Once the corrupted memory layout is in place, the next inference or specific code path can execute with attacker-controlled state. SonarSource demonstrated that, under realistic compilation settings (especially when Position Independent Executable hardening is absent), this primitive is enough to hijack function pointers and chain existing instructions into a full remote code execution exploit.

Even when modern mitigations make exploitation more difficult, defenders should assume that skilled adversaries can eventually weaponize such a bug, especially when the environment runs Ollama with broad filesystem access, powerful GPUs, and connectivity into internal networks.

𝗣𝗮𝘁𝗰𝗵 𝗦𝘁𝗮𝘁𝗲: 𝗪𝗵𝗮𝘁 𝗖𝗵𝗮𝗻𝗴𝗲𝗱 𝗶𝗻 𝗢𝗹𝗹𝗮𝗺𝗮 𝟬.𝟳.𝟬

In response to the disclosure, the Ollama maintainers refactored the vulnerable mllama logic. According to the project’s communication and SonarSource’s write-up, Ollama 0.7.0 removes the risky C++ implementation for this path and replaces it with new Go-based handling, which allows for safer memory management and clearer bounds-checking.

Practically, this means:

• The specific out-of-bounds write in the mllama metadata parsing is removed.
• The affected code path no longer relies on fragile vector operations in C++ that trust unvalidated indices.
• Instances running 0.7.0 or later are protected against this particular exploit chain, as long as they rebuild and deploy the patched version.

However, the patch does not retroactively fix every misconfigured deployment or every pattern of unsafe use. If an organization still runs older versions, or if it continues to allow arbitrary model uploads from untrusted sources, it remains exposed.

𝗢𝗹𝗹𝗮𝗺𝗮’𝘀 𝗕𝗿𝗼𝗮𝗱𝗲𝗿 𝗩𝘂𝗹𝗻 𝗟𝗮𝗻𝗱𝘀𝗰𝗮𝗽𝗲

This is not the first time that Ollama’s attack surface has raised concerns. Earlier work on “Probllama” (CVE-2024-37032) showed how remote code execution could be achieved on servers running versions before 0.1.34 by abusing model handling and file operations. More recent reports and CVE entries highlight a pattern of issues around GGUF parsing, ZipSlip-style path traversal in zip handling, and denial-of-service vectors via malformed model metadata.

When you combine those historical flaws with today’s out-of-bounds write in mllama parsing, a clear lesson emerges:

• Any system that treats LLM model files as opaque “content” while giving the parser deep trust can become a remote code execution platform when parsing goes wrong.
• Exposed Ollama servers, especially those discovered on the public internet, represent high-value targets because a single malicious model upload can pivot into full server compromise.

Consequently, defenders should not treat this as a one-off bug. Instead, they should treat model ingestion and GGUF parsing as a privileged boundary with the same seriousness they bring to container registries or package managers.

𝗠𝗶𝘁𝗶𝗴𝗮𝘁𝗶𝗼𝗻 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗢𝗹𝗹𝗮𝗺𝗮 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁𝘀

First, upgrade all Ollama instances to 0.7.0 or later, and ensure that the patched version is actually what runs in production containers, VM images, and lab environments. Because it is common for experimental AI infrastructure to lag behind official patches, you should scan for forgotten test servers and old Docker images that still expose vulnerable builds.

Second, restrict model sources. Whenever possible, limit Ollama to pulling models only from trusted registries and internal repositories. If your workflows accept user-supplied GGUF files, treat that interface as a remote code execution boundary and apply strict authentication, authorization, and audit logging around it.

Third, harden network exposure. Ollama servers should not be directly accessible from the public internet without strong controls. Place them behind authentication layers, API gateways, or VPNs; use firewall rules to restrict who can reach the API endpoints; and monitor for anomalous model upload and creation patterns.

Finally, assume compromise and design accordingly. Even with patches, consider what would happen if an Ollama host was taken over: what secrets it can access, which internal systems it touches, and whether it can pivot into your core environment. Segment those hosts, minimize privileges, and ensure that your detection stack can spot suspicious activity originating from AI infrastructure.

𝗪𝗵𝗮𝘁 𝗧𝗵𝗶𝘀 𝗠𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 𝗔𝗜 𝗦𝗲𝗰𝘂𝗿𝗶𝘁𝘆 𝗧𝗲𝗮𝗺𝘀

As organizations industrialize their use of local LLMs, tools like Ollama effectively become new runtime platforms. They ingest complex binary inputs, talk to GPUs, manage long-running processes, and often sit close to sensitive data. Because of that, vulnerabilities in model parsing are not merely engineering bugs; they are foundational weaknesses in the infrastructure that runs your AI workloads.

Security teams should therefore:

• Classify LLM inference engines as Tier-1 assets with clear owners and patch SLAs.
• Treat model ingestion endpoints like they treat container registries or artifact repositories.
• Integrate Ollama logs and model events into their SIEM, looking for unusual model uploads or crashes that might indicate exploit attempts.
• Incorporate AI runtimes into regular red-team and penetration testing exercises, rather than assuming they live safely on the outskirts.

If you are already running Ollama in production, this vulnerability is an opportunity to reassess assumptions and move AI infrastructure up to the same level of scrutiny as your web front ends, databases, and CI/CD pipelines.

One thought on “Ollama Under Fire: Code Execution in Popular LLM Framework

Leave a Reply

Your email address will not be published. Required fields are marked *