loading…
Search for a command to run...
loading…
Server implementation for Google BigQuery integration that enables direct BigQuery database access and querying capabilities
Server implementation for Google BigQuery integration that enables direct BigQuery database access and querying capabilities
This is a server that lets your LLMs (like Claude) talk directly to your BigQuery data! Think of it as a friendly translator that sits between your AI assistant and your database, making sure they can chat securely and efficiently.
You: "What were our top 10 customers last month?"
Claude: *queries your BigQuery database and gives you the answer in plain English*
No more writing SQL queries by hand - just chat naturally with your data!
This server uses the Model Context Protocol (MCP), which is like a universal translator for AI-database communication. While MCP is designed to work with any AI model, right now it's available as a developer preview in Claude Desktop.
Here's all you need to do:
EXCEPT clauses, so it remains useful without exposing individual records.config.json. Add your own detection patterns to match your organization's naming conventions (e.g., %guardian_name%, %beneficiary%), adjust scan frequency, set billing limits, and define per-table field restrictions. The scanner picks up your custom patterns on the next run and automatically protects any matching columns across all datasets.| Simple Mode | Protected Mode | |
|---|---|---|
| Use when | Personal projects, non-sensitive data | PHI, PII, financial data, HIPAA-regulated environments |
| Install | npx — no local setup needed |
npx or local build with a config.json |
| Field restrictions | None | Define preventedFields to block sensitive columns |
| Auto-scanner | Not available | Discovers sensitive columns across all datasets automatically |
| Setup | Quick Setup below | Protected Mode Setup below |
Why local deployment matters for sensitive data: LLM inference happens in the cloud. When an AI agent queries BigQuery, the results are sent to the LLM provider's servers (Anthropic, OpenAI, etc.) for processing — they leave your network. BigQuery IAM controls who can reach your data; field restrictions control what the AI agent surfaces into LLM responses. These are different protection boundaries. Configuring preventedFields ensures PHI and PII never enter the LLM conversation context, regardless of how many queries the agent runs autonomously.
Authenticate with Google Cloud:
gcloud auth application-default login
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"bigquery": {
"command": "npx",
"args": [
"-y",
"@ergut/mcp-bigquery-server",
"--project-id",
"your-project-id"
]
}
}
}
Start chatting! Open Claude Desktop and ask questions about your data.
For sensitive data with field-level restrictions:
Authenticate with Google Cloud (choose one method):
gcloud auth application-default login
# Save your service account key file and use --key-file parameter
# Remember to keep your service account key file secure and never commit it to version control
Add to your Claude Desktop config (claude_desktop_config.json):
With Application Default Credentials:
{
"mcpServers": {
"bigquery": {
"command": "npx",
"args": [
"-y",
"@ergut/mcp-bigquery-server",
"--project-id",
"your-project-id",
"--location",
"us-central1",
"--config-file",
"/path/to/config.json"
]
}
}
}
With a service account key file:
{
"mcpServers": {
"bigquery": {
"command": "npx",
"args": [
"-y",
"@ergut/mcp-bigquery-server",
"--project-id",
"your-project-id",
"--location",
"us-central1",
"--key-file",
"/path/to/service-account-key.json",
"--config-file",
"/path/to/config.json"
]
}
}
}
Start chatting! Open Claude Desktop and start asking questions about your data.
The server supports an optional config.json file for advanced configuration. Without a config file (i.e., no --config-file flag), the server runs in Simple Mode with safe defaults (1GB query limit, no field restrictions). To enable protection, pass --config-file /path/to/config.json when starting the server.
{
"maximumBytesBilled": "1000000000",
"preventedFields": {
"healthcare.patients": ["first_name", "last_name", "ssn", "date_of_birth", "email"],
"billing.transactions": ["credit_card_number", "bank_account"]
},
"sensitiveFieldPatterns": [
"%first_name%", "%last_name%", "%email%",
"%ssn%", "%date_of_birth%", "%password%"
],
"sensitiveFieldScanFrequencyDays": 1
}
| Setting | Default | Description |
|---|---|---|
maximumBytesBilled |
"1000000000" (1GB) |
Maximum bytes billed per query |
preventedFields |
{} |
Table-to-columns mapping of restricted fields |
sensitiveFieldPatterns |
Built-in set | SQL LIKE patterns for auto-discovery |
sensitiveFieldScanFrequencyDays |
1 |
Days between auto-scans (0 to disable) |
--project-id: (Required) Your Google Cloud project ID--location: (Optional) BigQuery location, defaults to 'US'--key-file: (Optional) Path to service account key JSON file--config-file: (Optional) Path to configuration file, defaults to 'config.json'--maximum-bytes-billed: (Optional) Override maximum bytes billed for queries, overrides config.json valueExample using service account:
npx @ergut/mcp-bigquery-server --project-id your-project-id --location europe-west1 --key-file /path/to/key.json --config-file /path/to/config.json --maximum-bytes-billed 2000000000
Data warehouses often contain highly sensitive information — patient records, social security numbers, financial data, personal contact details, and authentication secrets. When an AI agent has direct access to query your BigQuery warehouse, there is no human in the loop to prevent it from reading sensitive columns. A simple query like SELECT * FROM patients could expose thousands of PII/PHI records in a single response.
This server gives administrators fine-grained control over which columns an AI agent can access, ensuring sensitive data stays protected while still allowing the AI to perform useful analytical queries on non-sensitive fields.
Important: The field restrictions and table allowlists in this server are designed as cooperative guardrails for AI agents, not as a hard security boundary against adversarial attackers.
The threat model is straightforward: when an AI agent queries your BigQuery warehouse, the query results are sent to the LLM provider's servers. Field restrictions prevent the agent from inadvertently including sensitive columns (PII, PHI, secrets) in those results. When the agent encounters a restriction error, it reads the guidance in the error message and reformulates its query — using aggregate functions, EXCEPT clauses, or simply dropping the restricted field. In practice, AI agents cooperate immediately and consistently.
This system uses regex-based SQL analysis to detect restricted field usage. We performed penetration testing during development and fixed several bypass vectors (struct-alias expansion, comma-join evasion, implicit SELECT *). However, regex-based parsing cannot guarantee coverage of every possible SQL construct — edge cases involving deeply nested CTEs, exotic BigQuery syntax, or adversarial query crafting may exist. The enforcement logic is designed to fail closed (block ambiguous queries rather than allow them), but it is not equivalent to a database-level security policy.
What this is:
SELECT *, direct field references, aliases)What this is not:
For environments requiring strict compliance guarantees, combine these guardrails with BigQuery's native column-level security and authorized views.
The server supports three protection modes, configured via protectionMode in config.json:
| Mode | Description | Default when |
|---|---|---|
off |
No protection — all tables and fields accessible | No config file exists |
allowedTables |
Table allowlist — only listed tables can be queried | Must be explicitly set |
autoProtect |
Auto-scans for sensitive fields, enforces preventedFields |
Config file exists without protectionMode key |
allowedTables ModeRestricts the AI agent to a specific set of tables. Queries against any other table are rejected immediately. Optionally define field restrictions within allowed tables:
{
"protectionMode": "allowedTables",
"maximumBytesBilled": "10000000000",
"allowedTables": [
"analytics.page_views",
"analytics.sessions",
"reporting.daily_summary"
],
"preventedFieldsInAllowedTables": {
"analytics.page_views": ["user_ip", "user_agent"]
}
}
preventedFieldsInAllowedTables is optional — defaults to {} (no field restrictions within allowed tables)INFORMATION_SCHEMA queries are always allowed for schema discoveryautoProtect Mode (Field-Level Restrictions)The original protection mode. Auto-scans your BigQuery datasets for sensitive columns and enforces preventedFields. Manual entries in preventedFields persist through scans (the merge is additive-only). See details below.
Existing config files without protectionMode continue working — they default to autoProtect for backward compatibility.
Define preventedFields in your config to block the AI agent from accessing specific columns:
{
"preventedFields": {
"healthcare.patients": ["first_name", "last_name", "ssn", "date_of_birth", "email"],
"billing.transactions": ["credit_card_number", "bank_account"]
}
}
What happens when the AI agent tries to access a restricted field:
SELECT first_name, last_name, diagnosis FROM healthcare.patients
The server blocks the query and returns a clear, instructive error:
Restricted fields detected for table "healthcare.patients" columns "first_name", "last_name".
You can only use these columns inside ["count", "countif", "avg", "sum"]
aggregate functions or exclude them with SELECT * EXCEPT (...).
The AI agent learns from this error and adjusts its queries automatically. It can still run analytical queries that don't expose individual sensitive values:
-- Allowed: aggregate functions don't expose individual values
SELECT COUNT(first_name) AS patient_count, diagnosis
FROM healthcare.patients
GROUP BY diagnosis
-- Allowed: explicitly excluding restricted fields
SELECT * EXCEPT(first_name, last_name, ssn, date_of_birth, email)
FROM healthcare.patients
Query pattern reference:
| Query Pattern | Behavior |
|---|---|
SELECT restricted_col FROM table |
Blocked with error message |
SELECT * FROM table |
Blocked (would expose restricted fields) |
SELECT * EXCEPT(restricted_cols) FROM table |
Allowed |
COUNT(restricted_col), AVG(...), SUM(...), COUNTIF(...) |
Allowed (aggregates don't expose individual values) |
MIN(restricted_col), MAX(restricted_col) |
Blocked (returns actual individual values) |
SELECT non_restricted_col FROM table |
Allowed |
SELECT id FROM table WHERE restricted_col = '...' |
Blocked (see note below) |
SELECT id FROM table ORDER BY restricted_col |
Blocked (see note below) |
Note: Restricted fields in WHERE, ORDER BY, and other clauses are blocked, not just fields in SELECT. Even though the query results don't contain the restricted column, the full SQL query text is sent to the LLM provider as part of the conversation. A query like
WHERE email = '[email protected]'means the restricted value appears in the prompt sent to the cloud. The enforcement checks the entire query to prevent restricted data from leaving your network in any form.
Server-side logging: Every blocked query is logged on the server side, giving administrators visibility into what the AI agent attempted to access:
Query tool error: Error: Restricted fields detected for table "healthcare.patients" columns "first_name", "last_name".
Manually listing every sensitive column across hundreds of tables is impractical. The server includes an automated scanner that discovers sensitive columns across all your BigQuery datasets by querying INFORMATION_SCHEMA.COLUMNS with configurable SQL LIKE patterns. Discovered fields are automatically added to preventedFields in your config.
%first_name%, %ssn%, %email% are identified as sensitivepreventedFieldsWhen the MCP server starts, it checks if the config file is stale based on sensitiveFieldScanFrequencyDays. If stale, it automatically scans and updates the config:
Config is stale (scan frequency: 1 day(s)), running sensitive field scan...
Scanning all datasets for sensitive fields...
Found 1166 sensitive column(s) across 278 table(s)
Scan complete: config updated with 278 tables.
This means new tables with sensitive columns are automatically protected without any manual configuration. As your data warehouse grows, the scanner keeps up.
Run a scan on demand at any time:
npm run scan-fields -- --project-id your-project-id --config-file ./config.json
The default patterns cover common naming conventions (names, emails, SSNs, dates of birth, medical record numbers, insurance IDs, passwords, API keys, etc.), but every organization has its own. Add custom patterns to match your schema:
{
"sensitiveFieldPatterns": [
"%first_name%", "%last_name%", "%email%", "%ssn%",
"%date_of_birth%", "%password%", "%api_key%",
"%guardian_name%",
"%emergency_contact%",
"%beneficiary%",
"%next_of_kin%"
]
}
On the next auto-scan (or manual npm run scan-fields), the scanner picks up columns matching your new patterns and automatically adds them to preventedFields. As your data warehouse grows and new tables are added, any columns matching your patterns are automatically protected without manual intervention.
| Setting | Default | Description |
|---|---|---|
sensitiveFieldPatterns |
Built-in set covering names, contacts, identity, insurance, and secrets | SQL LIKE patterns to match against column names |
sensitiveFieldScanFrequencyDays |
1 (daily) |
Days between automatic scans. Set 0 to disable auto-scanning. |
You'll need one of these:
roles/bigquery.user (recommended)roles/bigquery.dataViewerroles/bigquery.jobUserRun a local build instead of npx — useful for contributing, testing changes, or running a pinned version. Supports both Simple and Protected Mode.
# Clone and install
git clone https://github.com/ergut/mcp-bigquery-server
cd mcp-bigquery-server
npm install
# Build
npm run build
Then update your Claude Desktop config to point to your local build:
Simple Mode (no config file):
{
"mcpServers": {
"bigquery": {
"command": "node",
"args": [
"/path/to/your/clone/mcp-bigquery-server/dist/index.js",
"--project-id",
"your-project-id",
"--location",
"us-central1"
]
}
}
}
Protected Mode (with config file):
{
"mcpServers": {
"bigquery": {
"command": "node",
"args": [
"/path/to/your/clone/mcp-bigquery-server/dist/index.js",
"--project-id",
"your-project-id",
"--location",
"us-central1",
"--key-file",
"/path/to/service-account-key.json",
"--config-file",
"/path/to/config.json"
]
}
}
}
## Current Limitations ⚠️
- The configuration examples above are shown for Claude Desktop and Claude Code, but any MCP-compatible client can use this server — provide the same JSON configuration to your AI agent and it will adapt to its own setup
- Queries are read-only with configurable processing limits (set in config.json)
- While both tables and views are supported, some complex view types might have limitations
- A config.json file is optional; without one the server uses safe defaults
## Support & Resources 💬
- 🐛 [Report issues](https://github.com/ergut/mcp-bigquery-server/issues)
- 💡 [Feature requests](https://github.com/ergut/mcp-bigquery-server/issues)
- 📖 [Documentation](https://github.com/ergut/mcp-bigquery-server)
## License 📝
MIT License - See [LICENSE](LICENSE) file for details.
## Author ✍️
Salih Ergüt
## Sponsorship
This project is proudly sponsored by:
<div align="center">
<a href="https://www.oredata.com">
<img src="assets/oredata-logo-nobg.png" alt="OREDATA" width="300"/>
</a>
</div>
## Version History 📋
See [CHANGELOG.md](CHANGELOG.md) for updates and version history.
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"ergut-mcp-bigquery-server": {
"command": "npx",
"args": []
}
}
}Query your database in natural language
Read-only database access with schema inspection.
Interact with Redis key-value stores.
Database interaction and business intelligence capabilities.