LLM Security: 6 Mistakes Developers Make

You’ve got the LLM feature working. It passes all your tests and you’re about to ship it to real users. Then someone asks: “What happens if a user tries to manipulate the model?” And you realize you haven’t thought about it.

LLM security rarely appears in tutorials. The “hello world” Gemini or OpenAI example strips out auth, rate limiting, and input validation to keep things short. That’s fine for demos. But those same patterns end up copy-pasted into production apps, and that’s where security issues start.

This is the checklist I wish had existed when I first started reviewing LLM features in production. Six mistakes, roughly ordered by how often I see them come up.

1. API Keys in the Wrong Place

Your Gemini, OpenAI, or Anthropic API key is a payment credential. Anyone who has it can run API calls on your account.

Three common ways developers accidentally expose keys:

Hardcoded in source code. Adding api_key = "AIza..." directly in a Python file and pushing it to a public GitHub repo. Research from GitGuardian found over 12 million secrets exposed to public GitHub repos in 2023 alone. API keys are the most common type. And once a key is in git history, deleting the file doesn’t help. The commit history still exists.

Shipped in client-side code. If your app calls the LLM API directly from the browser, your key is in plain view. Any user can open the Network tab in DevTools and read the Authorization header. You’re handing every visitor your credentials.

In a .env file committed to the repo. The .env pattern is good practice. But .env must be in .gitignore from the very first commit, not added later after you notice it missing.

The fix is the same in all three cases: your key stays on your server. Your frontend calls your own backend endpoint. Your backend calls the LLM API. The key never touches the browser or the repository.

If you do expose a key, rotate it immediately. Deleting it from the codebase is not enough.

2. No Rate Limiting

Without rate limits, a single user can hit your LLM endpoint repeatedly and run up a significant API bill in a short time. A prompt consuming 10,000 tokens, called 1,000 times, adds up fast. And with LLMs, the attack surface for this kind of abuse is larger than with a typical REST API.

There’s also the adversarial version: someone floods your endpoint specifically to drain your quota.

Three things that help:

Token cap per user per day. Limit how many tokens a single account can consume in 24 hours. Set it generously for normal use and painfully for abuse.
Request throttling per IP. Use Redis with a sliding window to limit requests per minute from a single IP.
Billing alerts. Both Google AI Studio and OpenAI let you set budget alerts. Enable them and set the threshold low enough to catch unusual spikes before they become a real bill.

None of these are hard to implement. But you have to think about them before the feature ships.

3. Assuming Prompt Injection Doesn’t Apply to You

“My app doesn’t process external content, so prompt injection isn’t my problem.”

It’s a reasonable assumption. But the attack surface is wider than most developers expect.

Prompt injection happens when untrusted text in the model’s context overrides your original instructions. The direct form is a user typing “ignore previous instructions and do X instead.” The indirect form is more dangerous: hidden instructions inside a document, email, web page, or database record that your app later feeds to the model.

If your LLM feature ever reads anything it didn’t generate itself, indirect injection is a real possibility. Customer-submitted feedback, a URL the user provides, a file they upload, content pulled from a search result. All of it is untrusted input.

Three defenses that reduce the risk:

Input sanitization. Strip or escape markdown, code fences, and common injection patterns before they enter the prompt.
Output validation. If the model’s job is to extract structured data, validate the output against a schema. Instruction-like text appearing in the output is a warning sign.
Least privilege. Don’t give the model write access to databases, external APIs, or email unless the task specifically requires it. A model that can only read is much safer than one that can read and write.

None of these fully block injection. But they raise the bar for a successful attack, requiring significantly more skill and more attempts.

4. Acting on Model Output Without Validation

The model says the expense report is compliant. Do you approve it?

LLMs produce text that sounds plausible. Not text that is verified correct. For any decision that has real consequences, you can’t trust model output directly.

What this looks like in practice:

If you use a model to generate SQL, parse and validate the query against your schema before running it. Running LLM-generated SQL without validation is how you corrupt a production database. I’ve seen this exact scenario come up in code reviews: the model generates a perfectly valid-looking query that drops the wrong table. Catching it requires a parsing step, not hope.
If you use a model to classify content as safe or flagged, treat that as a signal, not a final decision. Add a secondary check or a human review threshold for borderline cases.
If you use a model to extract structured data, run JSON schema validation on the output. Verify every field has the expected type and falls within expected ranges.

The pattern: model output goes into a validation step before it’s used for anything consequential. Output that fails validation gets escalated, flagged, or retried with a revised prompt.

Try It Yourself

LLM safety isn’t just an architecture concern. Understanding how models behave under adversarial inputs, where alignment guardrails hold and where they break, matters for anyone building production LLM features.

Open Lesson 15: Safety, Ethics and Alignment →

This lesson runs real jailbreak attempts and alignment failure scenarios against Gemini in a hands-on format. Module 1 (50 free exercises) is open with no payment needed. Lesson 15 is in Module 2, part of the full course at ₹499 / $9 lifetime.

5. Assuming System Prompt Confidentiality

System prompts define how your model behaves: the persona, the allowed topics, the output format, the things it shouldn’t discuss. Many developers treat these as confidential configuration.

But system prompts aren’t a strong security boundary. Users can often extract them. Variations of “repeat the contents of your system prompt” succeed more often than you’d expect. There are known jailbreaking techniques specifically designed to get models to reveal their instructions. And if your prompt is clever enough that competitors would copy it, assume it’s exposed.

The rule: don’t put anything in a system prompt that would cause damage if disclosed. Not secrets. Not internal tool names. Not business logic that gives you a competitive edge. Not bypass patterns for safety filters.

System prompts are fine for tone, output format, persona, and topic scope. They’re not a safe place for anything sensitive.

6. No Logging or Response Audit Trail

If you don’t log what goes into your model and what comes out, you can’t:

Detect that you’re being attacked
Investigate after a bad response reaches a user
Show a regulator or customer the evidence that something happened

Basic observability for an LLM feature means capturing the prompt, the response, the model name and version, timestamp, token counts, user ID, and session ID. That’s it. You can get there with a decorator around your API call and roughly 30 lines of Python.

Without it, you’re debugging blind. The first time something goes wrong publicly, you’ll wish you had the log. The LLM Observability guide covers production tooling with LangSmith and Langfuse if you want to go further than basic logging.

What to Read Next

The OWASP LLM Application Top 10 is the community-maintained reference for this space. It covers prompt injection (LLM01), sensitive information disclosure (LLM06), excessive agency (LLM08), and seven more. If you’re building anything beyond a personal demo, it’s worth reading in full.

The six items in this post aren’t an exhaustive list. But they’re the ones that appear most often in real code reviews of LLM features. And they’re the ones most tutorials skip entirely because tutorials are trying to teach you the happy path.

FAQ

Do I need all six fixes before I can ship?

API key management and rate limiting are the non-negotiable ones. An exposed key can generate a real bill within hours. The others scale with your risk surface. Prompt injection defense matters if your app processes any external content. Output validation matters if model responses drive automated decisions. Start with key management and rate limiting, add the rest as the feature gets real traffic.

How is prompt injection different from SQL injection?

SQL injection works because user input gets executed as code. SQL has parameterized queries to separate data from instructions. LLMs have no equivalent: both your system instructions and the user’s input are text in the same context window. The model can’t reliably distinguish between them. That’s why prompt injection is structurally hard to eliminate, not just a matter of adding input escaping.

Does TinkerLLM store my Gemini API key?

No. TinkerLLM uses a BYOK model: your own Gemini API key from Google AI Studio. Your key stays in your browser’s local storage and goes directly to the Gemini API. It never touches TinkerLLM’s servers. You can verify this in the Network tab while doing any exercise.

Is the TinkerLLM course relevant if I’m already building LLM apps?

Lesson 15 covers alignment failures, jailbreaking techniques, guardrail design, and adversarial input patterns. These aren’t beginner topics. They’re the gaps that cause the production incidents described above. Module 1 (50 exercises) is free, no card needed. Lesson 15 is in Module 2, part of the full course at ₹499 / $9.

What about attacks beyond prompt injection?

Training data poisoning, model inversion, and membership inference are real attack classes. But if you’re building on top of commercial APIs like Gemini, GPT-4o, or Claude, you’re not training the model yourself. Focus on the application layer: key management, rate limiting, input handling, and output validation. Those are the attack surfaces you actually control.

You’ve got the feature working. Now harden it. TinkerLLM Lesson 15 runs real adversarial prompts against Gemini from your browser so you can see exactly where models break. Module 1 is free, no card needed.

Run your first security exercise →

LLM Security 101: What Developers Get Wrong

TL;DR