SQL Query
The Query module contains methods related to SQL query handling.
Initialization & Imports
from waii_sdk_py import WAII
from waii_sdk_py.chat import *
from waii_sdk_py.query import *
from waii_sdk_py.database import *
from waii_sdk_py.semantic_context import *
from waii_sdk_py.chart import *
from waii_sdk_py.history import *
WAII.initialize(url="https://your-waii-instance/api/", api_key="your-api-key")
Important: You need to activate the database connection first before using the methods in this module. Otherwise you may trying to generate query from a wrong database.
WAII.Database.activate_connection("snowflake://...&warehouse=COMPUTE_WH")
Here are some of its methods:
Generate Query
Query.generate(params: QueryGenerationRequest) -> GeneratedQuery
This method generates a SQL query based on the provided parameters.
Parameter fields:
ask: The question you want to ask Waii to generate, such asHow many tables are there?dialect: (optional), by default we use the same dialect which the database you are using, so you DON'T need to specify this field if you want to generate a query in the same dialect as the database. If you want to generate a query in a different dialect, you can specify it here (such assnowflake,postgresql,mysql, etc.)tweak_history: (array ofTweak) We can support both asking new question, or tweak the previous question. If you want to tweak the previous question, you can set this field. Each ofTweakobject looks like:sql: The SQL query you want to tweakask: The previous question you asked.
search_context: List of scopes to limit tables/schemas used to generate query (By default it searches all objects from the database). It's list ofSearchContextobject. Each ofSearchContextobject looks like:db_name: Catalog name (database name), default is*schema_name: Schema name, default is*table_name: Table name, default is*
tags: List of tags to attach to LLM usage stats (which is available for users to query for on-premises version only, SaaS version will be able to query the usage stats in the future)- It's a list of string, such as
["dev", "tenant-2", "team=analytics", ...] user_id=<user_email_login_to_waii>is one of the default tag which is automatically added by the system (you don't need to add it)
- It's a list of string, such as
use_cache: Whether to use cache or not, default is True. If you set it to False, it will always generate a new query by calling LLM.model: Which LLM to be used to generate queries. By default system will choose a model.use_example_queries: Whether to use example queries (aka liked-queries) or not, default is True. If you set it to False, it will not use example queries to generate the query.parameters: Dictionary of additional configuration parameters for query generation:PUBLIC.LIKED_QUERIES.ENABLED: Boolean (True/False) - Enable using liked queries as examples (default is True)PUBLIC.LIKED_QUERIES.LEARNING_MODE: String - Values:"disabled","single","few-shot"(default isfew-shot)PUBLIC.DEEP_THINKING.ENABLED: Boolean - Enable deeper thinking for complex queries, it requires configure "reasoning" model like o3-mini, deepseek-R1, etc. (default is False)PUBLIC.QUERY_GENERATION.ANALYSIS.ENABLE_ALL: Boolean - Enable comprehensive query analysis, for example assumptions, clarify questions, etc. (default is False)PUBLIC.GUARDRAIL.INVALID_QUESTION_CHECKER.ENABLED: Boolean - Additional checks to prevent hallucination (default is False)PUBLIC.REFLECTION.ENABLED: Boolean - Additional checks on generated query to check interpretation of user question (default is False)
additional_context: (optional) List ofSemanticStatementobjects to provide additional context during query generation. These context will be treated like they are part of the system and follow all fields of the context. See Semantic Context module for detailed information aboutSemanticStatementfields and usage patterns.
Examples:
Ask a new question:
>>> WAII.Query.generate(QueryGenerationRequest(ask = "How many tables are there?"))
Ask a complex question:
>>> WAII.Query.generate(QueryGenerationRequest(ask = """
Give me all stores in California that have more than 1000
customers, include the store name, store address, and the number of
customers."""))
Generate query with search context (limited to specific tables, schemas, etc.)
By default, Waii search all tables in the database, if you know which tables you want to use, you can specify the search context to limit the tables used to generate query.
>>> WAII.Query.generate(QueryGenerationRequest(
ask = "How many cars we produced in Q2",
search_context=[SearchContext(db_name="*",
schema_name="schema1",
table_name="table1"),
SearchContext(db_name="*",
schema_name="schema2",
table_name="*")]))
The above query will only search tables from schema1.table1 and schema2.*
Tweak the previous question:
>>> WAII.Query.generate(QueryGenerationRequest(
ask = "only return 10 results",
tweak_history=[Tweak(sql="select * from information_schema.tables",
ask="give me all tables")]))
(The above tweak will return the first 10 results of the previous query)
Query.generate returns GeneratedQuery object, which contains the following fields:
uuid: UUID of the generated queryliked: Whether the query is liked or not (True/False). You can like a query by usingQuery.likemethod.query: The generated query texttables: Tables used in the generated querysemantic_context: Semantic context applied to the generated querydetailed_steps: Detailed steps of how the query work (in natural language)what_changed: If you do a tweak, it will tell you what changed in the querycompilation_error: If there is any compilation error, it will show here. (Waii will try to fix the compilation error automatically, but if it tried multiple times and still cannot fix it, it will show here)confidence_score: returns logprob confidence score based on the tokens from generated queries.llm_usage_stats: token consumption during the query generation.token_total: total token usage (prompt + completed), this doesn't include cached tokens. So if you see the total_total = 0, the query is fetched from the cache.
is_new: whether the query is new or tweaktimestamp_ms: timestamp when the query is generatedelapsed_time_ms: total elapsed time (in milli-seconds) for the query generationaccess_rule_protection_status:AccessRuleProtectionStatusobject showing query protection status regarding access rulesassumptions: a list of assumptions the model made during the query generation. It only got added when ``PUBLIC.QUERY_GENERATION.ANALYSIS.ENABLE_ALLis set totrue`
Tips to use tweak to update existing query
When you tweak a query, you should provide the original query and the original ask. The system will figure out what you want to tweak based on the original query/ask and the new ask.
Waii query generation is stateless, which means it doesn't remember the previously generated query. If you want Waii to remember the previous query, you can provide the original queries as Tweak and the original ask in the tweak.
You can provide a list of tweaks to tweak the query multiple times. It's like a history of tweaks. For example, initial question is Give me all orders, and you want to tweak it to in California, and then order by order_date, you can provide a list of tweaks. Assume the new ask is group by city and include the total order amount, you can provide the following tweaks plus the new ask:
>>> WAII.Query.generate(QueryGenerationRequest(
tweak_history=[Tweak(sql="...",
# original ask
ask="Give me all orders in California"),
Tweak(sql="...",
# tweak #1
ask="in California"),
Tweak(sql="...",
# tweak #2
ask="order by order_date")]),
# new ask
ask = "group by city and include the total order amount")
(tweak_history is ordered by when the question is submitted, from the oldest to the newest)
You don't need to worry about "context window" of the LLM during tweaking, the system will handle it for you. (it will include necessary previous context to try to get the best possible result).
Handle is_new field of GeneratedQuery
When you include tweak_history, the system will automatically decide whether the query is new or tweak. If the system thinks it is a new query, it will set is_new to True, otherwise it will set to False. You can use this field to decide whether you want to include previous asks for future query generation request or not.
For example, if you include tweak_history like ["give me all orders in California", "order by order_date"], and the new ask is give me all customers and stores, the system will set is_new to True because the new ask doesn't have enough relationship with the previous asks. In this case, you may want to clean up the tweak_history for the future query generation request.
Pseudo code to illustrate how to handle is_new field:
tweak_history = []
while True:
ask = get_user_input()
generated_query = WAII.Query.generate(QueryGenerationRequest(ask=ask, tweak_history=tweak_history))
# handle the generated query, check the is_new field
if generated_query.is_new:
# if it is new, clean up the tweak_history, only include the newly generated query as tweak (For the next query generation)
tweak_history = [Tweak(sql=generated_query.query, ask=ask)]
else:
# if it is not new, append the tweak to the tweak_history so that we can keep the context
tweak_history.append(Tweak(sql=generated_query.query, ask=ask))
Handle concurrent query generation
It's possible that you have multiple users generating queries at the same time. Waii will handle the concurrent query generation for you, but you need to make sure you maintain the tweak_history for each session separately.
Choose which LLM model to use
By default, the system will choose the best model to generate the query. If you want to use a specific model, you can specify it in the request. The system will try to use the specified model, if the specified model is not available, it will fail the request.
In order to choose which one to use, you need to get the list of available models first.
WAII.get_models()
Which include a list of ModelType object with the following fields:
name: Name of the modeldescription: Description of the modelvendor: Vendor of the model
The name field is the one you need to use to specify which model to use.
In order to specify which model to use, you can set model field in the request.
>>> WAII.Query.generate(QueryGenerationRequest(
ask = "How many tables are there?",
model="GPT-4o"))
Enable query analysis:
>>> WAII.Query.generate(QueryGenerationRequest(
ask = "Tell me the busiest day",
parameters = {"PUBLIC.QUERY_GENERATION.ANALYSIS.ENABLE_ALL": True}
))
This enables comprehensive query analysis which provides deeper insights into assumptions, clarify questions, etc.
It may return a query returns with assumptions field like The term 'busiest day' refers to the day with the highest number of queries executed.
Combine analysis with other parameters:
>>> WAII.Query.generate(QueryGenerationRequest(
ask = "What products have the highest profit margin by region?",
parameters = {
"PUBLIC.QUERY_GENERATION.ANALYSIS.ENABLE_ALL": True,
"PUBLIC.DEEP_THINKING.ENABLED": True,
"PUBLIC.LIKED_QUERIES.LEARNING_MODE": "few-shot"
},
use_cache = False
))
This combination enables advanced query analysis along with deeper thinking and example-based learning, forcing a fresh generation without using cache.
Async Query Generation
The generate() endpoint carries out the query generation synchronously. This will block until Waii generates a valid query.
To generate a query asynchronously use the pair of methods
WAII.query.submit_generate_query(params: QueryGenerationRequest) -> AsyncObjectResponse
WAII.query.get_generated_query(params: GetObjectRequest) -> GeneratedQuery
The AsyncObjectResponse and GetObjectRequest both contain a single field:
uuid: a unique identifier for the query generation request
Both submit_generate_query and get_generated_query are non-blocking calls. submit_generate_query will immediately return with the uuid, and get_generated_query will return the intermediate generated query.
The QueryGenerationRequest and GeneratedQuery follow the same semantics above.
GeneratedQuery additionally contains a current_step field of type QueryGenerationStep, which is relevant during async query generation. This is updated as the query is generated, along with other fields of the generated query as they are determined. None of these fields are considered finalized until the current_step becomes Completed
QueryGenerationStep values:
- Selecting Tables and Rules
- Generating Query
- Validating Query
- Completed
request = QueryGenerationRequest(
ask="How many tables are in the database"
)
response = WAII.query.submit_generate_query(request)
get_generated_query_request = GetObjectRequest(
uuid=response.uuid
)
generated_query = None
while True:
time.sleep(1)
generated_query = WAII.query.get_generated_query(get_generated_query_request)
if generated_query.current_step == QueryGenerationStep.completed:
return
# Analyze intermediate query here
# Analyze completed query here
Generate Question
You can also generate questions based on your database schema, which can be useful when you want to show to your user what kind of questions can be asked to the database.
>>> Query.generate_question(GenerateQuestionRequest(schema_name='PUBLIC',
n_questions=10,
complexity=GeneratedQuestionComplexity.hard))
Parameter fields:
schema_name: The schema name you want to generate questions. This is must be specified.n_questions: Number of questions you want to generate. Default is 10. You can choose a number between 1 to 30.complexity: Complexity of the questions you want to generate. Default isGeneratedQuestionComplexity.hard. You can chooseeasy,medium,hard.
Output GenerateQuestionResponse, fields:
questions: List of generated questions. For each question it hasquestion(string content of the question),complexity(complexity of the question) and tables (tables used in the question)
Transcode
>>> WAII.Query.transcode(TranscodeQueryRequest(source_dialect="postgres", source_query="select ...;", target_dialect="snowflake", ask="..."))
Parameter fields:
search_context: Same asQuery.generate, you can specify the search context to limit the tables/schemas used to transcode.source_dialect: The dialect of the source query, such assnowflake,postgresql,mongodbsource_query: The source query you want to transcodetarget_dialect: The dialect of the target query, such assnowflake,postgresql,mongodbask: The intructions you want to ask Waii to add when transcode the query, such asplace tables under schema1. It is optional.
Query.transcode returns GeneratedQuery object
Run a query (sync)
Query.run(params: RunQueryRequest) -> GetQueryResultResponse
Request fields:
query: The SQL query you want to runsession_id: Session ID of the query, default is None (which means Waii will generate a new session ID for you)current_schema: current schema to run the query, default is None (which means Waii will use the default schema of the connection). It isSchemaNameobject, which hasdatabase_name(optional),schema_namefields.session_parameter: a key value pairs to set the connection parameters, such as auto_commit, isolation_level, etc. Default is None (which means Waii will use the default connection parameters of the connection)
Response fields:
rows: List of rows returned. Each of row is json object, with column name as key, and column value as value.more_rows: Whether there are more rows or not (True/False), it can be True whenmax_returned_rowsis set and generated rows are more thanmax_returned_rowscolumn_definitions: List of column definitions. Each of column definition has name and typequery_uuid: UUID of the query (it is different from generate query UUID, this UUID is generated by underlying database and Snowflake)
Example:
>>> print(WAII.Query.run(RunQueryRequest(query = "SELECT COUNT(DISTINCT table_name) FROM waii.information_schema.tables")))
rows=[{'COUNT(DISTINCT TABLE_NAME)': 112}] more_rows=0 column_definitions=[ColumnDefinition(name='COUNT(DISTINCT TABLE_NAME)', type='FIXED', comment=None, sample_values=None)] query_uuid='01afbd1e-0001-d31e-0022-ba8700a8209e'
Run with default schema
from waii_sdk_py.database import *
>>> WAII.Query.run(RunQueryRequest(query='select current_schema(), current_database();', current_schema=SchemaName(schema_name='INFORMATION_SCHEMA')))
Async submit a query
In order to async submit query, you need 3 methods: Query.submit, Query.get_results, Query.cancel
Submit
Query.submit(params: RunQueryRequest) -> RunQueryResponse
Request fields same as Query.run
Response field:
query_id: the query id you need to get result of the query, or cancel the query
Get Results
Query.get_results(params: GetQueryResultRequest) -> GetQueryResultResponse
Return query result by providing uuid, same as Query.run, but you need to specify query_id instead of query in the request.
You can optionally provide the max no of rows you want query to return through max_returned_rows parameter. Default value of same is 10000
Cancel
Query.cancel(params: CancelQueryRequest) -> CancelQueryResponse
Cancel a running query by providing query_id, it is a no-op if the query is completed or failed.
Like
Query.like(params: LikeQueryRequest) -> LikeQueryResponse
LikeQueryRequest has the following fields:
query_uuid: Optional[str]
ask: Optional[str]
query: Optional[str]
liked: Optional[bool] = True
rewrite_question: Optional[bool] = False
detailed_steps: Optional[List[str]] = []
# by default, like query request applies to the current user only, there's an option to apply to other users
target_user_id: Optional[str] = None
target_tenant_id: Optional[str] = None
target_org_id: Optional[str] = None
You can specify a query is liked or unliked by set liked to True/False
You can either like an generated query by specifying query_uuid (the uuid from GeneratedQuery, not the id from run query).
Or, you can specify ask and query to like a query.
Important: rewrite_question is optional, if you set it to True (default False), the system will rewrite the ask based on query and the ask. Why we do this because sometimes the ask itself is wrong, ambiguous, etc. and we want to make sure the ask is clear.
If you want to keep the original ask, you can leave the flag False.
You can specify detailed_steps of generating the query in LikedQuery. This is optional.
Examples:
Like a generated query
WAII.Query.like(LikeQueryRequest(query_uuid='01afbd1e-0001-d31e-0022-ba8700a8209e', liked=True))
Like a query by specifying ask and query
WAII.Query.like(LikeQueryRequest(ask='How many tables are there?',
query='SELECT COUNT(DISTINCT table_name) FROM waii.information_schema.tables', liked=True))
Programmatically like a list of queries
# programmatically like a bunch of queries
queries = [
{
"uuid": "q1",
"query": """SELECT...""",
"ask": "Find all the orders"
},
{
# another query
}
]
for q in queries:
WAII.Query.like(
LikeQueryRequest(# SQL part of the query
query=q['query'],
# natural language ask
ask=q['ask'],
# uuid, if same uuid liked multiple times, older-ones will be replaced
uuid=q['uuid'],
# do we want the system to rewrite question based on the query and ask?
# when it is set to False, it will keep the ask as-is
rewrite_question=False,
# When set to True, like it, when set to False, remove it from the liked list
liked=True))
You will get an exception if the call is failed.
Like a query on behalf of another user
WAII.Query.like(LikeQueryRequest(ask='How many tables are there?',
query='SELECT ...', liked=True,
target_user_id='user@example.com'))
This will 'like' the query on behalf of the user user@example.com.
Admin can also specify the target_user_id, target_tenant_id to '*' to like the query on behalf of all users or all tenants.
e.g. ... target_user_id='*', target_tenant_id='my-company.com' will like the query on behalf of all users in my-company.com tenant.
Get Liked Queries
Query.get_liked_query(params: GetLikedQueryRequest) -> GetLikedQueryResponse
This method retrieves previously liked queries from the system.
Parameters:
query_uuid: (Optional) If specified, returns only the liked query with this UUID. If not specified, returns all liked queries the user has access to.
Output:
queries: List ofLikedQueryobjects that contain information about the liked queries.
Example:
# Get all liked queries
all_liked_queries = WAII.Query.get_liked_query(GetLikedQueryRequest())
# Get a specific liked query by UUID
specific_liked_query = WAII.Query.get_liked_query(GetLikedQueryRequest(query_uuid="01afbd1e-0001-d31e-0022-ba8700a8209e"))
# Process the results
for query in all_liked_queries.queries:
print(f"Ask: {query.ask}")
print(f"Query: {query.query}")
print(f"Liked: {query.liked}")
Note: You need the appropriate permissions (PUBLISH_LIKED_QUERIES) to access liked queries.
Describe
If you want to translate SQL to natural language, and explain step-by-step plans, you can use Query.describe method.
Query.describe(params: DescribeQueryRequest) -> DescribeQueryResponse
This method fetches a description of a SQL query.
Parameters:
query: The SQL query you want to describesearch_context: When describe the query, you may want to restrict which table/schema you want to use as context. Normally it is not required, and you should skip it.current_schema: current schema of the query
Output:
summary: A summary of the querydetailed_steps: Detailed steps of how the query work (in natural language)tables: Tables used in the query
Example:
>>> print(WAII.Query.describe(DescribeQueryRequest(query='SELECT COUNT(DISTINCT table_name) FROM information_schema.tables')))
summary='How many distinct table names are there in the information_schema.tables?' detailed_steps=['Step 1: Access the information_schema.tables table.', 'Step 2: Count the number of distinct table names.', 'Step 3: Return the count of distinct table names.'] tables=[TableName(table_name='TABLES', schema_name='INFORMATION_SCHEMA', database_name='WAII')]
Diff
This method generates a description of the difference between two SQL queries
Query.diff(params: DiffQueryRequest) -> DiffQueryResponse
Parameters:
- Similar to
Query.describe, you need to specify two queries you want to diff, and optionally you can specify the search context. query: The SQL query you want to describeprevious_query: The previous SQL query you want to describe
Output:
- Same as
Query.describe, but it will also includewhat_changedfield, which tells you what changed in the query.
Example:
>>> print(WAII.Query.diff(DiffQueryRequest(query='SELECT COUNT(DISTINCT table_name) FROM information_schema.tables', previous_query='SELECT COUNT(DISTINCT table_name) FROM information_schema.tables WHERE table_name LIKE \'%a%\'')))
summary='...' detailed_steps=[...] what_changed="The new query does not have any filter on 'table_name' column."
Apply Table Access Rules
This method accepts a query and applies the necessary table access rules for the user. In addition to returning the query with all table access rules applied, a protection status is returned detailing the status of the returned query and any errors that were encounted while applying the access rules
AccessRuleProtectionState is an string enum with the following values:
protected: All applicable access rules have been enforced within the query, including the case that there are no applicable access rulesunprotected: Cannot guarantee that all applicable access rules have been applied to the query, due to an error while applying access rules
An AccessRuleProtectionStatus is an object with the following fields:
state: anAccessRuleProtectionStatecapturing the state of the returned querymsg: an error message if thestateis unprotected explaining the cause of the error unprotected: Cannot guarantee that all applicable access rules have been applied to the query
An ApplyTableAccessRulesRequest is a request with the following fields:
query:str
An ApplyTableAccessRulesResponse is returned containing the following fields
query:strcontaining the response querystatus: AnAccessRuleProtectionStatusdescribed above containing the status for the response
Example usages: Given a table T with table access rules for the current user, both paths will get queries with the access rule applied
apply_table_access_rules_response = WAII.query.apply_table_access_rules(
ApplyTableAccessRulesRequest(
query = "select * from t"
))
if apply_table_access_rules_response.status.state == AccessRuleProtectionState.protected:
print(apply_table_access_rules_response.query)
generated_query = WAII.query.generate(QueryGenerationRequest(ask = "Can you get me all the data from t?"))
apply_table_access_rules_response = generated_query.apply_table_access_rules()
Auto Complete (Experimental)
This method allows you to automatically complete a partial query, this can be useful when you want to build a query editor with Github co-pilot-like auto complete feature.
This is an experimental feature, it may not work well in some cases.
Query.auto_complete(params: AutoCompleteRequest) -> AutoCompleteResponse
Parameters:
text: (required) The partial query you want to auto completecursor_offset: (required) The cursor offset of the partial query, it can be in the middle of the text (if you want to insert text in the middle. By default you should set it todialect: The dialect of the query, such assnowflake,postgresql,mongodbsearch_context: Same asQuery.describe, you can specify the search context to limit the tables/schemas used to auto complete.max_output_tokens: (required) Maximum number of tokens returned, you should set it to a lower value like 50 to make sure it can return within a few seconds.
Output:
textcompleted text
Example:
>>> print(WAII.Query.auto_complete(AutoCompleteRequest(text='select from', cursor_offset=11, max_output_tokens=50)))
text='WAII.WORLD.CITY WHERE POPULATION > 1000000'
Analyze Performance (Experimental)
Note: this feature currently only support Snowflake
Query.analyze_performance(params: QueryPerformanceRequest) -> QueryPerformanceResponse
Parameters:
query_id: The query id you want to analyze performance (it should be a run query id, not generated query id
Output:
summary: A summary of the runtime of a query as well as recommendations on how to make the query run fasterrecommendations: List of recommendations on how to make the query run fasterquery_text: The query text of the query (based on query_id)execution_time_ms: The execution time of the query (based on query_id)compilation_time_ms: The compilation time of the query (based on query_id)
This method provides a summary of the runtime of a query as well as recommendations on how to make the query run faster
Examples:
>>> print(WAII.Query.analyze_performance(QueryPerformanceRequest(query_id='...')))
'summary': [
'The metadata operation in Step 1 took 1.0 ms, which is 82% of the total execution time.',
],
'recommendations': [
"The query is already optimal as it is a simple count distinct operation on the 'table_name' column of the 'waii.information_schema.tables' table. No further optimization is required.",
...
],
'query_text': "SELECT COUNT(DISTINCT table_name) FROM waii.information_schema.tables",
'execution_time_ms': 312,
'compilation_time_ms': 247