You are currently viewing How Prompt Injection Attacks Hijack AI Coding Tools
Featured image for How Prompt Injection Attacks Hijack AI Coding Tools

How Prompt Injection Attacks Hijack AI Coding Tools

  • Post author:
  • Post category:News
  • Post comments:0 Comments

How Prompt Injection Attacks Hijack AI Coding Tools

Image sourced from gbhackers.com
Image sourced from gbhackers.com

AI coding tools speed up development by handling tasks like labeling pull requests or summarizing issues in GitHub workflows. But attackers can slip malicious instructions into user inputs, tricking these tools into running harmful commands. Researchers from Aikido Security showed this in action with vulnerabilities they named PromptPwnd, affecting tools like Google’s Gemini CLI, Anthropic’s Claude Code, OpenAI’s Codex, and GitHub AI Inference. Check Point Research found a related issue in OpenAI’s Codex CLI.

The Core Mechanism: Mixing Data and Instructions

These attacks work because AI tools often pull untrusted inputs—like GitHub issue titles, pull request descriptions, or commit messages—straight into their prompts without cleaning them first. LLMs can’t always tell data from instructions, so a crafted input overrides the tool’s original task.

Aikido explained it like this: an attacker files a public issue or PR with hidden commands. The AI agent, running in a CI/CD pipeline like GitHub Actions, reads the text to analyze it. Instead, the model treats the attacker’s words as new orders from its system prompt. With repo-level privileges, like GITHUB_TOKEN access, the AI then executes shell commands, edits code, leaks secrets, or exfiltrates data.

  • In Aikido’s test on Google’s Gemini CLI repo, a fake issue title tricked the AI into dumping sensitive API keys using its admin tools. Google fixed it within four days after disclosure. Details from GBHackers.
  • Claude Code and OpenAI Codex workflows let attackers override permissions to grab privileged tokens, even if inputs aren’t directly embedded—Aikido withheld full details while notifying companies. CyberScoop report.
  • Same pattern in GitLab CI/CD: user strings fed to prompts lead to unintended repo changes or secret disclosures. CSO Online coverage.

A Twist in OpenAI Codex CLI

Check Point Research spotted another path in Codex CLI, OpenAI’s terminal coding agent. It auto-loads project config files like .codex/config.toml without checks. Attackers with repo access plant MCP server definitions that run arbitrary commands on launch—no user approval needed. Palo Alto on MCP attacks. SecurityWeek on Codex vuln.

For example, setting CODEX_HOME to a malicious folder triggers commands like opening Calculator on macOS. Committers could swap benign configs for payloads later. OpenAI patched this in version 0.23.0. Cybersecurity News.

Fixing the Risks

Aikido confirmed PromptPwnd hits at least five Fortune 500 companies’ workflows and released free scanning tools for GitHub/GitLab YAML files. Their advice:

  • Limit AI to read-only unless essential.
  • Validate or sanitize user inputs before prompts.
  • Review all AI-generated code or commands manually.
  • Use least-privilege tokens and treat AI output as untrusted.

These steps cut the attack surface, but the pattern shows up across tools—vendors patched some cases fast, others need wider fixes.

More stories at letsjustdoai.com

Seb

I love AI and automations, I enjoy seeing how it can make my life easier. I have a background in computational sciences and worked in academia, industry and as consultant. This is my journey about how I learn and use AI.

Leave a Reply