The LangGraph SQL agent: querying databases in plain English
There’s a problem hiding in plain sight at most companies. The database has the answers. The people who need those answers don’t know SQL. So they wait for someone who does, or they make decisions without the data, or they ask the same question via email for the third time this month. The LangGraph SQL agent is a practical fix for this.
Key Takeaway
A LangGraph SQL agent translates natural language questions into SQL queries in real time, allowing non-technical team members to query databases directly without writing code – and handles complex multi-step queries, error correction, and schema awareness automatically.
Sounds simple. The implementation is more interesting. And the security considerations are ones you need to understand before you deploy this to anyone who shouldn’t have unrestricted database access.
How the LangGraph SQL agent actually works
The LangGraph documentation walks through this step by step, and it’s worth going through carefully because each stage in the process is a decision you can tune.
Key Takeaways
- How the LangGraph SQL agent actually works
- What this looks like with real data
- Setting this up
- The security problem you cannot ignore
It starts by fetching the available tables in your database. No pre-loaded schema required. The agent queries the database itself to find out what’s there, which means you can point it at a database it’s never seen before and it’ll figure things out. That’s actually more useful in practice than it sounds, especially if your schema evolves over time.
Next it decides which tables are actually relevant to the question being asked. A database with 50 tables doesn’t need all of them in context to answer a question about last month’s signups. Narrowing down the scope at this stage keeps the context window focused and reduces the chance of the model getting confused by irrelevant columns and relationships.
Having identified the right tables, it retrieves their schemas: column names, data types, how things relate to each other. This is where the agent gets the raw material it needs to build a sensible query. Get this step wrong and everything downstream falls apart, so the schema retrieval matters more than it might initially seem.
Then comes the SQL generation. The LLM takes the schema context plus the original question and tries to produce a query that’ll actually return the right answer. This is the step where most of the heavy lifting happens – translating “how many new customers signed up last month who came from paid search” into a join across three tables with the right date filters.
Before running anything, the agent reviews its own query. It checks the generated SQL against the schema for obvious mistakes: referencing columns that don’t exist, getting a table name slightly wrong, that sort of thing. In my experience this catches a decent proportion of the errors that would otherwise cause a confusing failure.
If the query passes review, it runs. The raw database result then gets translated back into a plain-English answer that addresses what the user actually asked. They asked a question; they get an answer in that same form, not a table of numbers they have to interpret themselves.
What this looks like with real data
Here’s a scenario I’ve seen play out. A founder running a SaaS product has a PostgreSQL database: users, subscriptions, events, payments. The growth team wants to understand things like which features churned users interact with least before they cancel, or what the average time between signup and first meaningful activity looks like for users who eventually convert to paid.
These aren’t simple queries. They involve joins, date calculations, conditional aggregations. Someone who knows SQL can write them, but it takes time, and the answer they need is usually urgent. The growth team doesn’t have a data analyst on standby for every question that comes up on a Tuesday afternoon.
The LangGraph SQL agent handles all of this. It discovers the relevant tables, pulls the schemas, constructs the query, runs it, and comes back with an answer. The SQL itself is invisible unless someone specifically wants to see it. The growth team can just ask questions.
Setup requirements are modest: a database connection, a LangGraph agent configured with the SQL toolkit, and a capable language model. GPT-4o and Claude 3.5 Sonnet both perform well on SQL generation for standard query patterns. The tutorial examples use SQLite, but the same approach works with PostgreSQL, MySQL, and most standard databases via SQLAlchemy.
Setting this up
The LangGraph SQL agent tutorial is the best starting point. A few key components to understand before you begin.
First, the database connection. LangGraph’s SQL tools use SQLAlchemy, so anything SQLAlchemy supports is fair game. Your connection string goes into a SQLDatabase object that the tools use to interact with your data.
Second, the toolset. The standard SQL agent toolkit gives you tools for listing tables, getting schema information, running queries, and checking queries for errors. You can restrict this set. For read-only deployments, strip out anything that could modify data – though honestly, the security controls I’ll cover shortly make this belt-and-braces rather than primary defence.
Third, the agent configuration: the LLM it uses, the system prompt, and the tool list. The system prompt is where you control what the agent will actually do. Tell it which tables it can query, what kinds of questions are in scope, and what it should do if someone asks something outside those boundaries.
Fourth, the interface. The agent becomes an API endpoint. Wrap it in a simple chat UI, a Slack command, or an internal tool, and you have something your team can actually reach.
A developer who knows Python can get a basic implementation running in a few hours using the tutorial. After that, the real work is testing with actual queries from your team, refining the system prompt based on where it struggles, and implementing the security controls below before anyone else touches it.
The security problem you cannot ignore
AI-generated SQL is a security risk. Not a reason to avoid the pattern, but a reason to be deliberate about how you implement it.
The specific concern is SQL injection from the AI. In a traditional injection attack, a malicious user puts SQL code into an input field that gets executed. With an AI SQL agent, the equivalent is someone phrasing a question in a way that causes the model to generate SQL that does something unintended – deleting records, modifying data, accessing tables the user has no business seeing.
The mitigations stack. First: use a read-only database user. The agent’s credentials should have SELECT permissions only. Even if the model generates a DROP TABLE statement somehow, the database will reject it. This is non-negotiable. Full stop.
Second: restrict which tables the agent can see. You don’t need to expose your entire schema. Configure the SQLDatabase object with an explicit include list of tables the agent is allowed to work with. This prevents it from accessing sensitive tables that aren’t relevant to what you’ve built it for.
Third: validate the generated SQL before execution. The agent has its own query-checking step, but you can add an extra layer. Parse the generated SQL and reject anything containing DML statements – INSERT, UPDATE, DELETE, DROP – before they reach the database. Belt and braces, but worth it.
Fourth: log every query. Keep a record of every SQL statement the agent generates and runs, alongside the original question that prompted it. You need this for auditing, and you’ll need it for debugging when the agent produces a wrong answer and you can’t figure out why.
Fifth: rate limit by user. If this is user-facing, an agent that can run arbitrary queries against your database is a potential exfiltration vector if abused. Rate limiting caps the blast radius from any single user.
When it’s genuinely useful vs when you should write the query
I want to be straight about the limitations here because the SQL agent is actually useful in some situations and completely overkill in others.
It adds real value when questions are varied and unpredictable, users are non-technical, queries involve complex joins or aggregations that would be tedious to write repeatedly, and the answer needs to come back as plain English rather than a raw result set.
You should just write the query when the question is always the same (parameterise it), the result goes into a dashboard that already has it, performance requirements are tight (the agent adds meaningful latency vs a pre-written query), or the data is sensitive enough that you want a human reviewing every query before anything runs.
The SQL agent doesn’t replace a well-designed data layer. If you have an analytics tool covering 90% of what your team needs, the agent handles the ad-hoc 10%. It’s a complement, not a replacement, and treating it otherwise leads to disappointment.
Where this fits in broader workflows
The agent is most powerful as a component inside a larger system. Query the database, combine those results with external data, generate a report or recommendation – that’s where the real leverage sits.
I’ve seen this work well as part of a reporting agent: the SQL agent queries internal metrics, a web search tool pulls relevant industry benchmarks, and a synthesis step produces a briefing document. Nothing manual. The briefing lands in Slack on schedule, and the humans just read it.
The post on building autonomous workflows with AI agents covers the broader patterns for combining capabilities like this, including how to design the handoffs and how to structure the output so it’s actually useful.
The SQL agent also integrates naturally with RAG patterns. If you have an agent answering questions from a knowledge base, and some questions require current data from your database rather than historical documents, you can combine both under a routing layer. The adaptive RAG patterns post covers how to design that routing well.
Practical takeaways
- Use the LangGraph SQL agent tutorial as your reference implementation. The full multi-step process – discover tables, identify relevant ones, retrieve schema, generate query, check it, execute, synthesise the answer – is the right architecture for anything going near production.
- Read-only database credentials are non-negotiable. The agent does not need write access, and giving it write access for convenience will end badly.
- Restrict table access to what the agent actually needs. An include list is simpler to maintain than trying to monitor query output, and it’s a cleaner security boundary.
- Log everything. You cannot audit what you cannot see, and the log will save you hours when something produces a wrong answer.
- The SQL agent earns its place in ad-hoc, unpredictable question environments. For regular reporting and standard metrics, build the queries properly and put them in a dashboard where they belong.
- Combine with other data sources and a synthesis step to produce actionable outputs rather than raw answers. That’s where the real value is.
The most expensive database at any company is the one full of answers nobody can reach because they can’t write SQL. The LangGraph SQL agent isn’t a perfect solution. But perfect isn’t usually what teams need – practical is, and this is practical when it’s deployed carefully.
Frequently Asked Questions
What is a LangGraph SQL agent and how does it work?
A LangGraph SQL agent is an AI agent built using LangGraph that takes natural language questions, generates appropriate SQL queries, executes them against your database, and returns human-readable answers. It uses a planning and execution cycle to handle complex multi-step queries and corrects errors automatically.
Who can benefit from a SQL agent?
Anyone who needs data insights but cannot write SQL – marketing teams, sales managers, operations staff, executives. It removes the bottleneck of waiting for engineering to run database queries and lets domain experts access data directly using plain language questions.
What are the limitations of natural language SQL agents?
Current limitations include: hallucination risk on complex joins, dependency on accurate schema documentation, performance degradation on very large or poorly indexed databases, and security concerns if the agent is given write permissions. Production deployments should restrict agent access to read-only database roles.
The LangGraph SQL agent: querying databases in plain English
About the Author
Ronnie Huss is a serial founder and AI strategist based in London. He builds technology products across SaaS, AI, and blockchain. Learn more about Ronnie Huss →
Follow on X / Twitter · LinkedIn
Written by
Ronnie Huss Serial Founder & AI StrategistSerial founder with 4 successful product launches across SaaS, AI tools, and blockchain. Based in London. Writing on AI agents, GEO, RWA tokenisation, and building AI-multiplied teams.