Why three models
When we first built sentiment scoring, we sent the full conversation to a large model and asked for a structured JSON output with sentiment, conversion likelihood, open questions, and suggested action. It worked well. It took 3.2 seconds. In an inbox product, 3.2 seconds feels like an eternity.
We now run three smaller, specialised calls in parallel. Each returns a single score or a short string. Total latency: 18ms at the 95th percentile.
The three calls
Call 1 is a classifier: positive, neutral, or negative. Input is the last 3 messages only, not the full thread. Call 2 is a regression scorer: 0-100 conversion likelihood based on 14 features extracted client-side (message length, response time, question count, budget mention, timeline mention). Call 3 is an action classifier: one of 8 fixed next actions.
// Call 2: conversion features (extracted client-side, no LLM needed)
const features = {
hasTimeline: /june|july|next month|asap/i.test(msg),
hasBudget: /budget|spend|afford|around/i.test(msg),
questionCount: (msg.match(/\?/g) || []).length,
messageLength: msg.length,
// ... 10 more
};Most of the signal for conversion likelihood comes from simple regex features, not language model inference. The LLM is only needed for the nuance.