Serious AI Bugs Found Exposing Vulnerabilities in Meta, Nvidia, and Microsoft Inference Frameworks

Nov 15, 2025
2 min read

Key Findings

Cybersecurity researchers have uncovered critical remote code execution vulnerabilities in major AI inference engines, including those from Meta, Nvidia, Microsoft, and open-source projects like vLLM and SGLang.
The vulnerabilities stem from the unsafe use of ZeroMQ (ZMQ) and Python's pickle deserialization, a pattern dubbed "ShadowMQ."
The root cause is a vulnerability in Meta's Llama large language model (LLM) framework (CVE-2024-50050) that was patched by the company last October.
The same unsafe pattern has been discovered in other inference frameworks, such as NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang.
The issues have been assigned CVE identifiers with CVSS scores ranging from 6.3 to 8.8, indicating high severity.
Successful exploitation could allow an attacker to execute arbitrary code on the AI infrastructure, escalate privileges, conduct model theft, and deploy malicious payloads.

Background

The AI inference engines are crucial components within AI infrastructures, and a compromise of a single node could have far-reaching consequences. The researchers found that the vulnerabilities stem from a pattern called "ShadowMQ," in which the insecure deserialization logic has propagated to several projects as a result of code reuse.

Meta Llama LLM Framework Vulnerability

The root cause of the issue is a vulnerability (CVE-2024-50050, CVSS score: 6.3/9.3) in Meta's Llama large language model (LLM) framework, which was patched by the company last October. The vulnerability involved the use of ZeroMQ's `recv_pyobj()` method to deserialize incoming data using Python's `pickle` module, coupled with the framework exposing the ZeroMQ socket over the network, leaving it open to remote code execution attacks.

Propagation to Other Inference Frameworks

Oligo Security researchers have since discovered the same unsafe pattern recurring in other inference frameworks, such as NVIDIA TensorRT-LLM, Microsoft Sarathi-Serve, Modular Max Server, vLLM, and SGLang. "All contained nearly identical unsafe patterns: pickle deserialization over unauthenticated ZMQ TCP sockets," said Avi Lumelsky, the researcher who discovered the vulnerabilities.

Tracing the Origins of the Problem

The researchers found that in at least a few cases, the vulnerable code was the result of a direct copy-paste. For example, the vulnerable file in SGLang says it's adapted from vLLM, while Modular Max Server has borrowed the same logic from both vLLM and SGLang, effectively perpetuating the same flaw across codebases.

Assigned CVE Identifiers and Severity

CVE-2025-30165 (CVSS score: 8.0) - vLLM (While the issue is not fixed, it has been addressed by switching to the V1 engine by default)
CVE-2025-23254 (CVSS score: 8.8) - NVIDIA TensorRT-LLM (Fixed in version 0.18.2)
CVE-2025-60455 (CVSS score: N/A) - Modular Max Server (Fixed)
Sarathi-Serve (Remains unpatched)
SGLang (Implemented incomplete fixes)

Potential Impact and Risks

With inference engines acting as a crucial component within AI infrastructures, a successful compromise of a single node could permit an attacker to execute arbitrary code on the cluster, escalate privileges, conduct model theft, and even drop malicious payloads like cryptocurrency miners for financial gain.

Sources

https://thehackernews.com/2025/11/researchers-find-serious-ai-bugs.html
https://x.com/TheCyberSecHub/status/1989357716859736571
https://www.linkedin.com/posts/ekiledjian_researchers-find-serious-ai-bugs-exposing-activity-7395195532593340416-5ybt
https://www.reddit.com/r/SecOpsDaily/comments/1ox0fo3/researchers_find_serious_ai_bugs_exposing_meta/
https://www.instagram.com/p/DRC5N84jWbC/