NextFlow - Lead Management & CRM for Growing Businesses

Why three models

When we first built sentiment scoring, we sent the full conversation to a large model and asked for a structured JSON output with sentiment, conversion likelihood, open questions, and suggested action. It worked well. It took 3.2 seconds. In an inbox product, 3.2 seconds feels like an eternity.

We now run three smaller, specialised calls in parallel. Each returns a single score or a short string. Total latency: 18ms at the 95th percentile.

18ms

P95 latency now

3,200ms

Original latency

177x

Speed improvement

The three calls

Call 1 is a classifier: positive, neutral, or negative. Input is the last 3 messages only, not the full thread. Call 2 is a regression scorer: 0-100 conversion likelihood based on 14 features extracted client-side (message length, response time, question count, budget mention, timeline mention). Call 3 is an action classifier: one of 8 fixed next actions.

// Call 2: conversion features (extracted client-side, no LLM needed)
const features = {
  hasTimeline: /june|july|next month|asap/i.test(msg),
  hasBudget: /budget|spend|afford|around/i.test(msg),
  questionCount: (msg.match(/\?/g) || []).length,
  messageLength: msg.length,
  // ... 10 more
};

Key insight

Most of the signal for conversion likelihood comes from simple regex features, not language model inference. The LLM is only needed for the nuance.

The prompt we use to score lead sentiment in 18ms.

Why three models

The three calls

More articles

Should AI reply to leads for you? A 6-month field study.

Anatomy of a quote that gets accepted in under 2 hours.

Hartfield Kitchens: from one quote a week to 12.