How to build a lead scoring agent (with real code examples)

Picture of Ronnie Huss
Ronnie Huss

Manual lead scoring is one of those tasks that sounds simple and turns out to eat an embarrassing amount of a sales team’s time. An AI lead scoring agent doesn’t just automate that time – it makes the scoring more consistent, more data-driven, and far easier to update when your ideal customer profile shifts.

Key Takeaway

An AI lead scoring agent eliminates manual qualification time by automatically analysing incoming leads against behavioural, demographic, and intent signals — and unlike human scoring, it applies criteria consistently at scale without fatigue or bias.

I’m going to walk through the architecture of a lead scoring agent in a way that’s actually useful if you’re going to build one, using CrewAI’s lead-score-flow as the reference implementation. This isn’t a code tutorial where I paste snippets and explain syntax. It’s a blueprint for the thinking that needs to happen before you write a single line.

Key Takeaways

  • What a lead scoring agent actually does
  • The data inputs you need to define first
  • Defining scoring criteria the agent can apply consistently
  • The agent architecture: who does what

What a lead scoring agent actually does

Before getting into implementation, let’s be precise about the job. A lead scoring agent takes a lead – typically a name, company, job title, email domain, and whatever behavioural data you have – and outputs a score plus a rationale. The score tells your sales team where to focus. The rationale tells them what to say when they reach out.

The score without the rationale is a number without a story. Sales reps who don’t understand why a lead scored highly won’t use the context effectively. Reps who do understand it are immediately better equipped for their outreach. The rationale isn’t optional – it’s half the value of the whole system.

A well-built agent also flags gaps: leads that scored medium because the data was incomplete, not because the fit was weak. That distinction is operationally important. A lead with missing data is a different follow-up action from a lead with complete data that scored average. Conflating them costs you opportunities.

The data inputs you need to define first

The most common mistake when building lead scoring agents is starting with the scoring logic before defining the data. Your scoring criteria can only be as good as your inputs. Get this wrong and you build an agent that confidently produces scores based on insufficient or irrelevant data – which is worse than no scoring at all, because it breeds false confidence.

For B2B lead scoring, the typical input set includes:

  • Firmographic data: Company size, industry, geography, estimated revenue, funding stage if relevant. This usually comes from your CRM, enriched via tools like Clearbit, Apollo, or Hunter.
  • Contact data: Job title, seniority, department, years in role. Seniority is particularly important — a decision-maker at a mid-size company is a different opportunity from an individual contributor at a large enterprise.
  • Behavioural data: What pages they visited, how long they spent on pricing, whether they downloaded a resource, which emails they opened. This is intent data, and it significantly improves scoring accuracy when it’s available.
  • Source context: How did this lead come in? Inbound from a high-intent page, referral from a customer, outbound cold contact, event attendee. Source carries predictive weight that most scoring models undervalue.

Map your available data against this list before you start building. You’ll almost certainly have gaps. Those gaps either become data enrichment tasks for the agent to run, or they become explicit unknowns in the scoring rationale. Don’t paper over them.

Defining scoring criteria the agent can apply consistently

Scoring criteria need to be specific enough for an LLM to apply consistently across hundreds of leads. Vague criteria produce inconsistent scores. If your ideal customer profile says something like “good cultural fit” or “innovative company,” that’s not a scoring criterion – it’s a feeling. Turn it into something measurable before you hand it to an agent.

A well-structured rubric looks something like this:

  • Company size 10–200 employees: +20 points. Under 10: +5. Over 200: +10.
  • Industry in target list (SaaS, professional services, e-commerce): +25 points. Adjacent industries: +10. Unrelated: 0.
  • Decision-maker title (Founder, CEO, CTO, Head of, Director of): +20 points. Manager level: +10. Individual contributor: +5.
  • Visited pricing page: +15 points. Visited blog only: +5.
  • Inbound lead: +15 points. Referral: +20. Cold outbound: 0.

The numbers are illustrative – your actual weights should come from looking at which characteristics correlate with closed deals in your historical CRM data. If you don’t have enough historical data yet, start with reasonable assumptions and adjust quarterly as evidence accumulates.

The CrewAI lead-score-flow example (at github.com/crewAIInc/crewAI-examples) implements this with a structured scoring flow where each agent handles a specific component: one researches the company, one evaluates firmographic fit, one evaluates intent signals, and an orchestrating agent combines the results into a final score and rationale. The separation of concerns is worth studying even if you adapt the implementation significantly.

The agent architecture: who does what

A lead scoring agent isn’t a single LLM call. It’s a workflow with multiple specialised agents, each responsible for a specific part of the evaluation. Here’s the architecture that works:

Research agent: Takes the lead’s company name and domain, calls enrichment APIs (Clearbit, Apollo, or similar), and returns structured firmographic data. This agent’s output is facts, not judgements. Company size: 45 employees. Industry: professional services. Funding: Series A in 2023.

Scoring agent: Takes the enriched lead data plus your scoring rubric and applies the criteria systematically. Outputs a numerical score for each dimension and a total. This agent should be prompt-engineered to apply the rubric literally, not interpret it creatively. Consistency matters more than nuance at this stage.

Rationale agent: Takes the scored data and writes a two- to three-sentence summary explaining the score in plain language. This is what goes into your CRM and what your sales rep reads before picking up the phone. It should highlight the top two or three factors – positive or negative – and flag any data gaps.

Output formatter: Structures the final result as a CRM-ready payload. Score, tier (hot/warm/cold based on score band), rationale, data confidence, and recommended next action.

This separation means you can upgrade or replace any single agent without rebuilding the whole system. If your enrichment API changes, only the research agent needs updating. If your scoring rubric evolves, only the scoring agent’s prompt changes. Clean boundaries make iteration much easier.

Integrating with your CRM

The agent produces output. The CRM integration is what makes it operationally useful. Without it, you have a scoring tool that someone has to manually copy from. With it, you have a workflow that closes the loop automatically.

Most CRMs – HubSpot, Pipedrive, Salesforce – have REST APIs that accept lead updates. Your output formatter agent should produce a JSON payload matching the CRM’s expected format, and a simple function call pushes it. In a CrewAI flow, this is a tool that the final agent calls as part of its task.

What to write back to the CRM: the numerical score, the tier label, the rationale text (in a notes field or custom property), and a timestamp. Don’t overwrite existing fields your sales team maintains manually – score fields should be clearly labelled as AI-generated so reps understand the source and can apply their own judgement on top of it.

One pattern worth implementing: write the score to a separate AI Score field rather than your existing Lead Score field. This lets sales reps compare the AI score to their own assessment, which is genuinely useful data. When AI scores and rep scores consistently diverge, that tells you something – either about the scoring criteria or about how the rep is evaluating leads. This connects to the broader value of AI agents for sales follow-up – the scoring is the first step in an intelligent follow-up sequence, not a standalone data point.

What the output should look like

A lead scoring output that actually gets used by sales teams looks like this:

  • Score: 72/100
  • Tier: Warm
  • Rationale: Fits ideal company size (38 employees, professional services) and has a decision-maker contact (Founder). Visited pricing page twice in the last seven days. Score is capped by missing behavioural data for two recent sessions and unconfirmed budget indicator.
  • Data confidence: Medium (firmographic data confirmed, behavioural data partial)
  • Recommended action: Personalised email referencing pricing interest within 24 hours

That’s a complete, actionable output. The rep doesn’t need to do any additional thinking to decide what to do next. They read the rationale, understand the context, and execute the recommended action. That is the value – removing the friction between a signal and a response.

When to run the agent

Trigger options: on lead creation (immediate scoring when a new lead enters your CRM), on enrichment completion (score after enrichment APIs have returned data), on a scheduled re-score (weekly re-evaluation of leads that haven’t been contacted, to catch intent signals that emerged after initial scoring), or on a behavioural event (re-score when a lead visits pricing again or downloads a high-intent resource).

Most teams start with on-creation scoring and add the behavioural triggers once the basic system is running reliably. Don’t over-engineer the triggers upfront. Get the scoring logic right first, then optimise when to apply it.

The real value is iteration, not the first version

The first version of your lead scoring agent will be wrong in specific, predictable ways. Some leads that scored high won’t convert. Some that scored low will. That’s not a failure – it’s the data you need to refine the rubric.

Build a review process into your workflow from day one. Monthly, a sales leader looks at closed-won deals and compares AI scores against actual outcomes. Adjust the rubric. Re-score historical leads. Track whether accuracy improves. A lead scoring agent that gets reviewed and updated quarterly will be dramatically more accurate at month twelve than a static scoring model that never changes.

The compound value of a system that improves over time is real. Plan-and-execute agent patterns include exactly this kind of feedback loop – the output of an execution phase becomes the input for the next planning phase. The same logic applies to lead scoring. Run, evaluate, improve. That’s the only path to a scoring model that actually moves your close rate.

Frequently Asked Questions

How does an AI lead scoring agent work?

An AI lead scoring agent ingests lead data from CRM and marketing tools, applies a scoring model based on predefined criteria (company size, intent signals, engagement behaviour, fit attributes), assigns a score and recommended next action, and routes the lead to the appropriate sales workflow automatically.

What data sources should a lead scoring agent use?

Effective lead scoring agents combine: CRM data (company size, industry, past interactions), website behaviour (pages visited, time on site, content downloads), email engagement, social signals, and third-party intent data from providers like Bombora or G2. The more data sources, the more accurate the scoring.

What ROI should you expect from automating lead scoring?

Teams typically see 40–60% reduction in time spent on manual qualification, improved conversion rates as sales focus shifts to higher-quality leads, and faster response times since AI scoring happens in real time rather than in weekly review cycles. Measuring before and after pipeline conversion rates is the clearest ROI indicator.

How to build a lead scoring agent (with real code examples)

About the Author

Ronnie Huss is a serial founder and AI strategist based in London. He builds technology products across SaaS, AI, and blockchain. Learn more about Ronnie Huss →

Follow on X / Twitter · LinkedIn

Written by

Ronnie Huss Serial Founder & AI Strategist

Serial founder with 4 successful product launches across SaaS, AI tools, and blockchain. Based in London. Writing on AI agents, GEO, RWA tokenisation, and building AI-multiplied teams.

Part of the AI Agents Hub by Ronnie Huss
SearchScore AI Visibility Badge
Get your free AI, SEO & CRO audit — instant results
Audit link sent! Check your inbox.