Skip to main content

Handling PII Data

What is PII?

Personally Identifiable Information (PII) refers to any data that can be used to identify a specific individual. This includes information such as names, addresses, phone numbers, social security numbers, and email addresses. Protecting PII is crucial for maintaining privacy and complying with regulatory requirements.

The Problem in the Context of Text-to-SQL

Waii enables natural language queries over databases using text-to-SQL and conversational BI. While this offers significant benefits for data accessibility and usability, it also introduces challenges in handling PII data. The primary concerns include:

  • Ensuring PII does not leave the user's network.
  • Preventing PII from being sent to external model vendors for training purposes.
  • Exposing PII to unauthorized users.

Options for Handling PII safely

Self-Hosted Version of Waii

  • Deployment: The self-hosted version ensures that all data collected by WAII resides within the user's Virtual Private Cloud (VPC).
  • Encryption: Supports encryption at rest and in motion, ensuring data security.
  • Outbound Connections: The only outbound connections required are to a database, a vector store, and a Large Language Model (LLM). These components can be configured to remain within the same VPC, ensuring complete control over data.
  • Cloud Integration: If using cloud services such as AWS RDS or Azure OpenAI within the same cloud environment where WAII is deployed, the cloud provider guarantees data privacy, maintaining compliance and security standards.

Excluding PII from Waii

  • Database Roles and Column Masking: Another approach to handling PII is to prevent WAII from accessing PII columns altogether. This can be achieved by configuring database roles in such a way that Waii does not have access to these columns or by using column masking.

  • User Access Permissions: Waii always passes through user's access permissions to the database, so setting these permissions appropriately ensures that PII does not enter Waii.

  • Example with Snowflake:

    • Role Configuration:
      CREATE ROLE waii_read_only;
      GRANT USAGE ON DATABASE my_database TO ROLE waii_read_only;
      GRANT USAGE ON SCHEMA my_schema TO ROLE waii_read_only;
      GRANT SELECT ON TABLE my_table TO ROLE waii_read_only;
      REVOKE SELECT ON COLUMN pii_column FROM ROLE waii_read_only;
    • Column Masking:
      CREATE MASKING POLICY pii_mask AS (val STRING) 
      RETURNS STRING ->
      CASE
      WHEN CURRENT_ROLE() IN ('waii_read_only') THEN '****'
      ELSE val
      END;

      ALTER TABLE my_table MODIFY COLUMN pii_column SET MASKING POLICY pii_mask;

Configuring WAII to Identify PII Columns

  • PII Column Identification: WAII can be configured to be aware of PII columns when adding a database or by modifying existing connections.
  • Handling PII:
    • Waii will not sample PII data.
    • Waii will not store any PII values.
    • Waii will not inspect the structure of PII data.
    • Waii will not send any PII values to the LLM.
  • User Access: Users with the necessary permissions can still query and retrieve PII fields, but Waii will otherwise ignore these fields.

Audit Log for Enhanced Protection

  • Logging Information: WAII maintains an audit log of all information sent to the LLM.
  • Periodic Verification: The audit log can be reviewed periodically to ensure no PII information has been inadvertently sent.
  • Investigation Tool: In case of a suspected data breach, the audit log serves as a resource for investigating and understanding the scope and nature of the breach.

Using an LLM Proxy

  • LLM Proxy: WAII supports the use of an LLM proxy, either its own (built-in) or third-party proxies, to enhance PII protection.
  • Plugins for PII Detection: These proxies come with plugins designed to scan and detect PII in LLM calls.
  • Flagging Issues: The proxy can catch and flag potential PII issues, providing an additional layer of protection.

Using a Local Model for PII-Sensitive Queries

  • Local Model Configuration: If using PII with the LLM is necessary to generate correct SQL (e.g., for filter conditions), and deploying a powerful enough LLM locally is not feasible, WAII can be configured to use a smaller local model.
  • Filter Generation: This local model generates the filter conditions involving PII.
  • SQL Compilation: The filters generated by the local model are included in the overall SQL statement by WAII's compiler.