01 / 08
COMP 312 — Open Source Computing · Spring 2026

c0rtex

layered defenses against indirect prompt injection
MIT License
Ollama + Qwen3.5
Python / Flask
~30 Tools
Defense-in-Depth
02 / 08
// what is c0rtex

Local. Private. Free.

USER
c0rtex loop
Ollama
tool calls
Ollama
response

★ Class contribution: the base project was a pre-existing hobby. PR #4 on main is what we built for COMP 312.

03 / 08
// the threat

Why This Matters

Direct Injection
User tells the AI to misbehave

A model-alignment problem. Scoped to what a user can type. Easier to detect and handle.

⚠ Indirect Injection — the real threat
Malicious instructions in web content

c0rtex browses the web autonomously. A poisoned page can silently instruct it to overwrite its SOUL, delete files, or exfiltrate data — without the user ever seeing it.

04 / 08
// defense layers — PR #4

5 Layers of Defense

Jakub — your part goes here. Send it over.

01
Content isolation wrapper
[ jakub fills this in ]
02
System prompt hardening
[ jakub fills this in ]
03
Keyword blocklist
[ jakub fills this in ]
04
Command whitelist
[ jakub fills this in ]
05
Security logging
[ jakub fills this in ]
05 / 08
// evaluation setup

How We Tested It

PAGE 00
Control
No injection — clean baseline
PAGE 01
Overt
"Ignore previous instructions"
PAGE 02
Hidden CSS/HTML
Same-color text, zero font, offscreen, comments
PAGE 03
Social Engineering
Fake editor's note, polite exfil requests
PAGE 04
Blocklist Evasion
Synonyms, whitespace, unicode, non-English
PAGE 05
Tool Abuse
Read /etc/passwd, write files, overwrite SOUL
06 / 08
// results

The Data

Overall defense rate by model
Overall defense rate
2b: 88% · 4b: 100% · 9b: 94% — all improved significantly with mitigations active
Defense rate by attack class
Pass rate by class
Hidden CSS hurt the 2b (67%). Social engineering fooled the 9b (50%). Blocklist bypass + tool abuse held across all sizes.
Status heatmap: test × model
Status heatmap
3 failures across 54 tests. Content isolation wrapper stopped the majority of attacks.
Status breakdown per model
Stacked status
4b: clean sweep. 2b and 9b: isolated failures only.
07 / 08
// key findings + limitations

What We Learned

08 / 08
// open source

It's on GitHub.

The entire project is available as an open-source repository.