Back to Blog
AI9 min read

The prompt we use to score lead sentiment in 18ms.

It is not a single prompt. It is three small models stacked, and the latency math is interesting.

Apr 05, 2026
9 min read
X

Why three models

When we first built sentiment scoring, we sent the full conversation to a large model and asked for a structured JSON output with sentiment, conversion likelihood, open questions, and suggested action. It worked well. It took 3.2 seconds. In an inbox product, 3.2 seconds feels like an eternity.

We now run three smaller, specialised calls in parallel. Each returns a single score or a short string. Total latency: 18ms at the 95th percentile.

18ms
P95 latency now
3,200ms
Original latency
177x
Speed improvement

The three calls

Call 1 is a classifier: positive, neutral, or negative. Input is the last 3 messages only, not the full thread. Call 2 is a regression scorer: 0-100 conversion likelihood based on 14 features extracted client-side (message length, response time, question count, budget mention, timeline mention). Call 3 is an action classifier: one of 8 fixed next actions.

// Call 2: conversion features (extracted client-side, no LLM needed)
const features = {
  hasTimeline: /june|july|next month|asap/i.test(msg),
  hasBudget: /budget|spend|afford|around/i.test(msg),
  questionCount: (msg.match(/\?/g) || []).length,
  messageLength: msg.length,
  // ... 10 more
};
Key insight

Most of the signal for conversion likelihood comes from simple regex features, not language model inference. The LLM is only needed for the nuance.

Enjoyed this article?