In order to configure Waii to use LLMs, we need two files, model file and endpoint configuration file.

Endpoint configuration file is used to specify how to connect to different LLMs, it can be OpenAI, Bedrock, Fireworks, etc.

Model file is used to allow users to give a name/description for the model. Which will be exposed via API (so you don't have to expose the actual LLM model name).

Step 1: Endpoint Configuration File

Waii has a LLM endpoint configuration file that allows you to specify the LLMs you want to use, if you use OpenAI (not Azure OpenAI), you don't need to configure the file, you can simply add the OPENAI_API_KEY environment variable.

For Azure OpenAI

We recommend to use gpt-4o as the default LLM, and use text-embedding-ada-002 as the default embedding model.

- name: gpt-4o
  endpoints:
    - api_key: <azure_api_key>
      api_base: https://<your-domain>.openai.azure.com/
      deployment_name: gpt-4o
      model_type: chat
- name: text-embedding-ada-002
  endpoints:
    - api_key: <azure_api_key>
      api_base: https://<your-domain>i.openai.azure.com/
      deployment_name: ada2
      model_type: embedding

For OpenAI compatible endpoints

Now there're many vendors providing OpenAI compatible endpoints, such as Fireworks, vLLM, etc.

- name: openai/my-llm
  endpoints:
    - api_base: #such as https://hosted-vllm-api.co
      model_type: chat
- name: openai/my-emb
  endpoints:
    - api_base: #such as https://hosted-vllm-api.co
      model_type: embedding

Please note that openai/ must be prefixed to the model name, Waii uses this to determine which LLM API to use. api_base must be the base URL of the endpoint, and model_type must be either chat or embedding.

After you have configured the endpoint configuration file, you can use the model name in the model file (see below).

When you start the server, you can find the logs from stdout looks like

Initializing chat with args {'api_base': '...', 'model': 'openai/my-llm'}
Initializing embedding with args {'api_base': '...', 'model': 'openai/my-emb'}
Checking health of chat model={'api_base': '...', 'model': 'openai/my-llm'}
Successfully health checked chat model={'api_base': '...', 'model': 'openai/my-llm'}
Successfully health checked embedding model={'api_base': '...', 'model': 'openai/my-emb'}

Which indicates the LLM endpoints are successfully connected.

For AWS Bedrock

When use Bedrock, you need to configure both the embedding model and LLM model.

Note: If you’re using AWS IAM roles, you do not need to include aws_access_key_id or aws_secret_access_key in your configuration.

# If you want to use Claude 3.5 Sonnet
- name: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
  endpoints:
    - api_type: claude
      model_type: chat
      aws_access_key_id: ...
      aws_secret_access_key: xxxxxxxx
      aws_region_name: <...>
      anthropic_version: <...>
# if you want to use Mistral Large
- name: bedrock/mistral.mistral-large-2402-v1:0
  endpoints:
    - api_type: bedrock
      model_type: chat
      aws_access_key_id: <...>
      aws_secret_access_key: <...>
      aws_region_name: <...>
- name: bedrock/cohere.embed-english-v3
  endpoints:
    - api_type: cohere
      model_type: <...>
      aws_access_key_id: <...>
      aws_secret_access_key: <...>
      aws_region_name: <...>

If you’re using AWS IAM roles, here is the IAM policy that needs to be attached to the Waii service to access Bedrock.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "aws-marketplace:Subscribe",
                "aws-marketplace:Unsubscribe",
                "aws-marketplace:ViewSubscriptions"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "aws-marketplace:Subscribe"
            ],
            "Resource": "*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws-marketplace:ProductId": [
                        "prod-m5ilt4siql27k",
                        "prod-cx7ovbu5wex7g",
                        "prod-ozonys2hmmpeu",
                        "prod-fm3feywmwrog",
                        "a61c46fe-1747-41aa-9af0-2e0ae8a9ce05",
                        "216b69fd-07d5-4c7b-866b-936456d68311",
                        "prod-tukx4z3hrewle",
                        "prod-nb4wqmplze2pm",
                        "b7568428-a1ab-46d8-bab3-37def50f6f6a",
                        "38e55671-c3fe-4a44-9783-3584906e7cad"
                    ]
                }
            }
        }
    ]
}

The above list of ProductId is copied from =https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html

For all the project id mappings for Bedrock models, you can refer to https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-permissions.html

We recommend to use: Sonnet 3.5 (or Sonnet 3.5 V2) as LLM, and Cohere Embed (English) as embedding model.

You can also include guardrail_identifier and guardrail_version if applicable.

For example, guardrail_identifier: arn:aws:bedrock:us-east-1:855284387825:guardrail/h4ykjllg2tl6 and guardrail_version: 3 as below:

# If you want to use Claude 3.5 Sonnet
- name: bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
  endpoints:
    - api_type: claude
      model_type: chat
      aws_access_key_id: ...
      aws_secret_access_key: xxxxxxxx
      aws_region_name: <...>
      anthropic_version: <...>
      guardrail_identifier: arn:aws:bedrock:us-east-1:855284387825:guardrail/h4ykjllg2tl6
      guardrail_version: 3
# if you want to use Mistral Large
- name: bedrock/mistral.mistral-large-2402-v1:0
  endpoints:
    - api_type: bedrock
      model_type: chat
      aws_access_key_id: <...>
      aws_secret_access_key: <...>
      aws_region_name: <...>
      guardrail_identifier: arn:aws:bedrock:us-east-1:855284387825:guardrail/h4ykjllg2tl6
      guardrail_version: 3
- name: bedrock/cohere.embed-english-v3
  endpoints:
    - api_type: cohere
      model_type: <...>
      aws_access_key_id: <...>
      aws_secret_access_key: <...>
      aws_region_name: <...>
      guardrail_identifier: arn:aws:bedrock:us-east-1:855284387825:guardrail/h4ykjllg2tl6 
      guardrail_version: 3

Step 2: Model File

And you need a model file to specify the model name and which LLM it uses.

If you are using vanilla OpenAI endpoints (Not Azure OpenAI), you don't need to specify the model file.

When you use other endpoints, you need to specify the model file.

For example Mistral Large (backrock):

# Claude 3.5 Sonnet
- name: Claude 3.5 Sonnet
  models:
  - bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
# Mistral Large
- name: Mistral Large
  models:
  - bedrock/mistral.mistral-large-2402-v1:0

The Model file allows end user to specify which LLM to use during query generation or other API calls which need LLM interaction.

Users can also find the models via API / UI.

For example on UI:

LLM rate limit setting

In order to make it usable by multiple users, or run the query generation in parallel. You should make sure your LLM endpoint has sufficient rate limit for the endpoint which will be used by Waii.

Recommendation:

For both of the embedding (such as text-embedding-ada-002) and LLM endpoints (such as OpenAI, Sonnet 3.5, etc), you should set the make sure Token-per-minute (TPM) is at least 100K (300k is better). And Request-per-minute (RPM) to be at least 500 (2000 is better).

Please note that the avg token usage will be way lower than the TPM, but it is good to have a buffer for the peak usage. (Especially when you run multiple queries in parallel, or add a database because initial knowledge graph building will consume more tokens for the first few minutes.)

For Azure OpenAI​

For OpenAI compatible endpoints​

For AWS Bedrock​

Step 2: Model File

LLM rate limit setting​

For Azure OpenAI

For OpenAI compatible endpoints

For AWS Bedrock

LLM rate limit setting