Skip to main content

Which LLM to Choose?

We are LLM‑agnostic. For text‑to‑SQL over complex schemas, you need models with top‑tier reasoning capabilities. Below we compare hosted LLMs and provide recommendations organized by deployment environment, focusing on accuracy across text‑to‑SQL benchmarks.

Compare hosted LLMs (The following models are all considered top tier and comparable)

  • GPT‑4o / GPT‑4.1 / GPT‑4.1 Mini
  • Anthropic Claude 4 Sonnet
  • Google Gemini 2.5 Flash/Pro

For these models, their performances are consistent stat at the top across simple, medium and hard queries.

Open‑source & Self‑hosted LLMs

If you prefer to host your own model, here're some options:

  • Llama 4 Maverick
    Decent model, good for simple queries, but expect 15-35% accuracy drop for medium hard queries.

  • Deepseek V3, R1 V3 and R1 are on-par with top-tier hosted LLMs from OpenAI, Anthropic and Google.

Recommendations by Environment

AWS

  • Claude 4 Sonnet (AWS Bedrock) for text-to-SQL generation
  • Titan Embeddings for semantic search and retrieval

You can set up LLM models in the AWS Bedrock console. Once you request access to a model from AWS, there are two ways to provision a model for use:

Provisioned Throughput:

Inference Profiles:

Google Cloud

  • Gemini 2.5 Flash for fast queries and lightweight operations
  • Gemini 2.5 Pro for complex reasoning and hard queries (reasoning only)
  • embedding 4 for semantic search

Azure

  • OpenAI Embeddings for semantic search
  • GPT-4.1 / GPT-4.1 Mini / GPT-4o depending on your latency and accuracy requirements

Self-hosted

  • Llama 4 Maverick for high-throughput production workloads
  • DeepSeek V3 for general reasoning tasks
  • DeepSeek R1 for specialized reasoning and chain-of-thought operations