Which LLM to Choose?
We are LLM‑agnostic. For text‑to‑SQL over complex schemas, you need models with top‑tier reasoning capabilities. Below we compare hosted LLMs and provide recommendations organized by deployment environment, focusing on accuracy across text‑to‑SQL benchmarks.
Compare hosted LLMs (The following models are all considered top tier and comparable)
- GPT‑4o / GPT‑4.1 / GPT‑4.1 Mini
- Anthropic Claude 4 Sonnet
- Google Gemini 2.5 Flash/Pro
For these models, their performances are consistent stat at the top across simple, medium and hard queries.
Open‑source & Self‑hosted LLMs
If you prefer to host your own model, here're some options:
-
Llama 4 Maverick
Decent model, good for simple queries, but expect 15-35% accuracy drop for medium hard queries. -
Deepseek V3, R1 V3 and R1 are on-par with top-tier hosted LLMs from OpenAI, Anthropic and Google.
Recommendations by Environment
AWS
- Claude 4 Sonnet (AWS Bedrock) for text-to-SQL generation
- Titan Embeddings for semantic search and retrieval
You can set up LLM models in the AWS Bedrock console. Once you request access to a model from AWS, there are two ways to provision a model for use:
Provisioned Throughput:
- This ensures dedicated capacity reserved for you.
- Higher throughput, higher cost.
- https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html
Inference Profiles:
- This routes the request to an LLM in an available region.
- Shared throughput, lower cost.
- Can use the system inference profiles, or create your own application inference profile.
- https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html
Google Cloud
- Gemini 2.5 Flash for fast queries and lightweight operations
- Gemini 2.5 Pro for complex reasoning and hard queries (reasoning only)
- embedding 4 for semantic search
Azure
- OpenAI Embeddings for semantic search
- GPT-4.1 / GPT-4.1 Mini / GPT-4o depending on your latency and accuracy requirements
Self-hosted
- Llama 4 Maverick for high-throughput production workloads
- DeepSeek V3 for general reasoning tasks
- DeepSeek R1 for specialized reasoning and chain-of-thought operations