Which LLM to Choose?

We are LLM‑agnostic. For text‑to‑SQL over complex schemas, you need models with top‑tier reasoning capabilities. Below we compare hosted LLMs and provide recommendations organized by deployment environment, focusing on accuracy across text‑to‑SQL benchmarks.

Compare hosted LLMs (The following models are all considered top tier and comparable)

GPT‑4o / GPT‑4.1 / GPT‑4.1 Mini
Anthropic Claude 4 Sonnet
Google Gemini 2.5 Flash/Pro

For these models, their performances are consistent stat at the top across simple, medium and hard queries.

Open‑source & Self‑hosted LLMs

If you prefer to host your own model, here're some options:

Llama 4 Maverick
Decent model, good for simple queries, but expect 15-35% accuracy drop for medium hard queries.
Deepseek V3, R1 V3 and R1 are on-par with top-tier hosted LLMs from OpenAI, Anthropic and Google.

Recommendations by Environment

AWS

Claude 4 Sonnet (AWS Bedrock) for text-to-SQL generation
Titan Embeddings for semantic search and retrieval

You can set up LLM models in the AWS Bedrock console. Once you request access to a model from AWS, there are two ways to provision a model for use:

Provisioned Throughput:

This ensures dedicated capacity reserved for you.
Higher throughput, higher cost.
https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html

Inference Profiles:

This routes the request to an LLM in an available region.
Shared throughput, lower cost.
Can use the system inference profiles, or create your own application inference profile.
https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html

Google Cloud

Gemini 2.5 Flash for fast queries and lightweight operations
Gemini 2.5 Pro for complex reasoning and hard queries (reasoning only)
embedding 4 for semantic search

Azure

OpenAI Embeddings for semantic search
GPT-4.1 / GPT-4.1 Mini / GPT-4o depending on your latency and accuracy requirements

Self-hosted

Llama 4 Maverick for high-throughput production workloads
DeepSeek V3 for general reasoning tasks
DeepSeek R1 for specialized reasoning and chain-of-thought operations

Compare hosted LLMs (The following models are all considered top tier and comparable)​

Open‑source & Self‑hosted LLMs​

Recommendations by Environment​

AWS​

Provisioned Throughput:​

Inference Profiles:​

Google Cloud​

Azure​

Self-hosted​