Gemini 3 Pro vs Flash: Which Model to Use

You open the model dropdown in Google AI Studio and see two current options: gemini-3.5-flash and gemini-3.1-pro-preview. Flash is the default. The numbering looks backwards, the Pro model carries a lower version number than the Flash one, and you are not sure which your project actually needs.

Here is the short version, the one I give when an engineer on our team asks me which to use. Gemini 3.5 Flash is the right default for most work, and this generation it does something earlier ones didn’t: it beats the Pro model on coding and agentic tasks while costing less. Gemini 3.1 Pro earns its place on the hardest reasoning and the longest documents. The boundary between them is specific enough to test in about ten minutes.

This post gives you that test and the decision framework, so you stop guessing at the dropdown.

The Naming Is Backwards on Purpose

Gemini 3.5 Flash shipped after Gemini 3.1 Pro. That is not a typo. Google decoupled the Flash and Pro release cycles, so the newest Flash carries a higher version number than the current Pro. You can confirm the active model IDs on the Gemini models page.

It matters because the old rule of thumb, Pro is always smarter, no longer holds this generation. On coding and agentic benchmarks, 3.5 Flash outperforms 3.1 Pro, and it does so at lower cost. So the question stopped being which model is better in the abstract. It became which model is better for the specific thing you are doing.

What Each Model Is Actually For

Gemini 3.5 Flash (gemini-3.5-flash) is the generalist. Fast, strong on code and tool use, and the only one of the two with a free tier. For the large majority of real tasks, it is enough.

Gemini 3.1 Pro (gemini-3.1-pro-preview) is the reasoning specialist. It holds state across long inference chains and supports a larger context window for very long documents. It is paid-only, which we will come back to, because it changes the math.

Both speak the same Gemini API. Switching is one string in your code:

model = "gemini-3.5-flash"        # fast, free tier, strong on code
model = "gemini-3.1-pro-preview"  # deepest reasoning, paid only

Everything else stays the same, so you can test both on the same prompts with zero refactoring.

Speed

Flash is fast. On a short prompt you see a response in about a second, and on longer generations it stays in the low single-digit seconds. Pro is slower on hard work, because it traces more reasoning steps before answering.

On a simple prompt the gap is small. It widens on tasks that need sustained, multi-step thinking. I have run the same prompts on both across a range of lengths, and that pattern is consistent. If you only test on easy prompts, you will not see it, which is exactly how people end up paying for Pro latency they do not need.

Where Pro Still Wins, and Where It Doesn’t

Pro earns its cost in a few specific places:

Long-document analysis, where its larger context window lets it hold a whole contract or codebase at once.
Multi-step logic, where the chain runs five, six, seven steps and Flash starts drifting around step four.
Judgment calls with competing constraints, where a defensible answer beats a fast one.

And here is where it does not win this generation: coding and agentic work. Gemini 3.5 Flash matches or beats 3.1 Pro on those, faster and cheaper. When I put a coding task to both, Flash usually lands the same answer and returns sooner. So if you are building a coding assistant or an agent, reaching for Pro by reflex is the wrong move. Test first.

Pricing and the Free Tier (the part that changed)

This is the detail most comparison posts miss. In April 2026, Google removed the Pro models from the free tier. Only the Flash models, Gemini 3.5 Flash and 3.1 Flash-Lite, are free now.

So the practical picture for a learner or a prototype:

Flash has a free tier, roughly 1,500 requests a day, which is plenty for learning and side projects. The current numbers live in the rate-limits docs.
Pro requires billing enabled. There is no free Pro any more.

On paid usage, Pro runs around three to five times the per-token cost of Flash. Pair that with the fact that Flash beats Pro on code, and the default gets very clear. I keep prototypes on Flash and only enable billing once a specific task proves it needs the deeper reasoning. Check Google’s pricing page for current rates, since they move.

The Decision Framework

One question covers most of it, and it is the one I reach for on every new project: does the task need reasoning across many steps, or is it generating from a well-defined pattern?

Use Gemini 3.5 Flash when:

You are generating text, formatting output, summarizing, or writing and reviewing code.
Response time is part of your user experience.
You are running high volume, or you are on the free tier.
You are prototyping and have not found a specific Flash failure yet.

Use Gemini 3.1 Pro when:

You have tested Flash on the task and found reasoning failures.
You are solving problems with five or more sequential logical steps.
You are analyzing documents long enough to need the bigger context window.
You can absorb the latency and the cost, with billing enabled.

For production, start with Flash everywhere, log the cases where it produces weak answers, and route only that cluster to Pro. Your average cost stays close to Flash, and Pro usage gets justified by real failure data instead of a hunch.

What About Gemini 2.5?

If a tutorial still points you at gemini-2.5-flash or gemini-2.5-pro, those models work, but they are the previous generation, and Google has them scheduled for shutdown no earlier than October 2026. After that, calls to them fail. The Flash-versus-Pro logic is identical, so the move is simple: switch the model string to the 3.x equivalent and keep going. If you want the older breakdown, here is Gemini 2.5 Pro vs Flash.

Try It Yourself

The fastest way to feel the difference is the test I run myself before committing to a model: the same prompt on both. Open Google AI Studio, set the model to gemini-3.5-flash, and run a prompt that needs real multi-step reasoning, like a multi-constraint scheduling puzzle. Note the answer and how it shows its work. Then switch to gemini-3.1-pro-preview and run the exact same thing. On easy prompts you will see little difference. On hard ones, you will see where Pro earns its cost, and where it does not.

You will need a Gemini API key for this. It is free from Google AI Studio and takes about two minutes, and the free tier runs gemini-3.5-flash without billing.

TinkerLLM’s Lesson 20 covers LLM APIs in production, including model selection and hands-on exercises that compare Gemini models inside structured prompts. The full Gemini API documentation is the reference once you go deeper.

Open Lesson 20: LLM APIs in Production →

FAQ

Is Gemini 3 Pro worth it over Flash?

For most tasks, no. Gemini 3.5 Flash is faster, cheaper, free-tier eligible, and it beats 3.1 Pro on coding and agentic work. Pro is worth it specifically for the hardest multi-step reasoning and for documents long enough to need its bigger context window. If you cannot point to a concrete task where Flash fails and Pro succeeds, Flash is the right call. Test your hardest actual prompts on both before paying for Pro.

Is Gemini 3.5 Flash really better than Gemini 3.1 Pro?

On coding and agentic benchmarks, yes, and at lower cost. That is a real inversion from earlier generations, where Pro was the clear top model. On the deepest reasoning and longest-context tasks, 3.1 Pro is still ahead. So neither is strictly better. Flash wins on speed, cost, and code. Pro wins on hard reasoning and long documents.

Is Gemini 3 Pro free?

No. Google removed the Pro models from the free tier in April 2026. To call gemini-3.1-pro-preview you need billing enabled in Google Cloud. The Flash models, including gemini-3.5-flash, still have a free tier of roughly 1,500 requests a day, which is why Flash is the default for learning and prototyping.

How do I switch between Gemini 3 Pro and Flash?

Change one string. Both models use the same Gemini API endpoint, so you swap gemini-3.5-flash for gemini-3.1-pro-preview in the model parameter and nothing else changes. Many production systems run Flash by default and route specific request types to Pro based on a classifier or task category.

Should I still use Gemini 2.5?

Only if you have a reason to stay. The 2.5 models run until Google’s shutdown date, no earlier than October 2026, then they stop working. New projects should start on 3.x. If you are maintaining something on 2.5, plan the swap to gemini-3.5-flash before the cutoff. The Gemini 2.5 Pro vs Flash guide covers that generation in detail.

Flash is the right model until your task proves otherwise, and this generation that is truer than ever. The fastest way to know: run your ten hardest prompts on both and compare. TinkerLLM’s first 50 exercises are free, no card needed, and Lesson 20 goes deeper on the Gemini API in production once you are ready.