Guide

How to A/B Test Outreach Campaigns

Most outreach teams make decisions based on gut feel rather than data — they change subject lines, scripts, and CTAs simultaneously and have no idea what actually moved the needle. Systematic A/B testing replaces guesswork with evidence, turning every campaign into a learning opportunity that compounds over time. This guide walks you through designing valid experiments, interpreting results, and building a testing program that continuously improves your outreach performance.

Get Started

More guides

Before you start

An active outreach campaign with at least 200 contacts per test variant for statistical validity
An email or video outreach platform with split-testing or variant tracking capabilities
A clearly defined primary metric you are optimizing for (reply rate, meeting booking rate, or positive reply rate)

Step-by-step guide

Define What You Are Testing and Why

Every A/B test should start with a hypothesis: 'I believe that changing [variable] from [A] to [B] will increase [metric] because [reason].' Write this hypothesis down before running the test. Without a hypothesis, you are generating noise rather than insight. Good A/B testing candidates include subject lines, opening sentences, CTA phrasing, email length, send day/time, personalization depth, and for video outreach, script angle or video thumbnail style.

Prioritize testing variables with the highest potential impact first. Subject lines affect whether anyone reads your email at all — test those before testing CTA button color or email footer copy.

Test One Variable at a Time

The cardinal rule of A/B testing: change only one element between variants A and B. If you change the subject line, the opening sentence, and the CTA simultaneously, you cannot attribute any change in performance to a specific variable. The temptation to 'fix everything at once' is understandable but produces tests that cannot be learned from. Choose the single variable most likely to move your primary metric and isolate it.

Determine Your Sample Size Before Starting

Running a test on 20 prospects per variant and declaring a winner is not valid testing — it is noise. Use a sample size calculator (many are free online) to determine how many contacts you need per variant to detect a meaningful difference with 95% confidence. As a rule of thumb, testing email subject lines typically requires 200-500 contacts per variant; testing CTA phrasing may require fewer if the expected effect size is large.

If your list is small, run tests over a longer time period rather than splitting a small list in half. A test run over three weeks with 150 contacts per variant is more reliable than a test run over three days with the same total contacts.

Set Up Your Test With Proper Controls

Split your prospect list randomly into two equally sized groups — not by alphabetical order, company size, or any other structured variable that could introduce bias. Send both variants simultaneously (or within the same time window) to control for day-of-week and time-of-day effects. If your platform does not support true random splitting, manually shuffle your list before dividing it.

Choose the Right Metrics for Each Test Type

Different tests should be evaluated on different primary metrics. Subject line tests: open rate. Opening sentence tests: reply rate (since open rate will be held constant). CTA tests: positive reply rate or meeting booking rate. Video thumbnail tests: click-through rate. Video script angle tests: view-through rate and reply rate combined. Avoid using a single metric for all tests — it leads to optimizing the wrong thing.

Track secondary metrics alongside your primary metric. A subject line that dramatically increases open rate but decreases reply rate is not actually winning — it is misleading people into opening an email they are not interested in.

Let the Test Run to Completion

Checking your test results hourly and pausing the losing variant as soon as it falls behind is one of the most common testing mistakes. It is called 'peeking' and it produces false positives. Commit to a minimum run time (typically one to two full weeks to account for day-of-week variation) and a minimum sample size before evaluating results. Stop the test only after both conditions are met.

Document Results and Build a Testing Roadmap

Record every test in a shared document: the hypothesis, the variants, the sample sizes, the results, and the conclusion. Over time, this creates an institutional knowledge base that accelerates future testing and prevents teams from re-testing things that have already been answered. Use test results to inform the next round of hypotheses — a winning subject line strategy suggests a new angle worth testing in the opening sentence.

Share test results with the whole revenue team in a monthly review. Test learnings from outbound outreach often apply to inbound emails, sales sequences, and even ad copy — the insight has value beyond the specific campaign where it was tested.

Common mistakes to avoid

Declaring a winner after a few dozen contacts because one variant looks better

Fix: Wait until you hit your predetermined sample size and run time before evaluating. Early results are dominated by randomness, not real signal. Patience is the most underrated A/B testing skill.

Running multiple simultaneous tests on the same audience segment

Fix: If your audience receives both Test A (subject line experiment) and Test B (CTA experiment) at the same time, the two tests contaminate each other. Run tests sequentially or on clearly separated audience segments to maintain clean data.

Testing only tactical variables (subject lines, button text) and never testing strategic ones (value proposition, persona targeting, channel mix)

Fix: The biggest performance gains come from testing strategic hypotheses — whether a different pain point resonates more, whether a different persona converts better, or whether adding video to an email sequence materially changes meeting rates. Do not spend all your testing capacity on minor optimizations.

What are the key takeaways from this guide?

Valid A/B testing requires a written hypothesis, a single changed variable, a predetermined sample size, and the discipline to let the test run to completion before evaluating results — skip any of these steps and the data is meaningless.
Different campaign elements require different primary metrics: subject lines map to open rate, opening sentences to reply rate, and CTAs to meeting booking rate — using one metric for all tests leads to optimizing the wrong outcomes.
A documented test history is a compounding asset — over time, your library of test results tells you more about your market and buyer than any best-practice guide, because it is based on your actual prospects rather than averages.

Frequently asked questions

How many contacts do I need for a valid A/B test?

A minimum of 100 contacts per variant is a reasonable floor, but 200-500 per variant is better for detecting modest effect sizes at 95% confidence. For very high-impact variables (like a completely different value proposition), you may see a large enough effect to detect significance with fewer contacts. Use a free sample size calculator to determine the right number for your expected effect size.

How long should an outreach A/B test run?

At minimum, one full week to capture variation across different days of the week. Two weeks is better for most outreach tests. Do not cut a test short because one variant is 'clearly' winning early — early leaders frequently regress to the mean as sample size increases and statistical noise decreases.

Can I A/B test video scripts or just email elements?

Absolutely. Video script angle, video length, thumbnail style, and opening hook are all testable elements. For video outreach, the primary metric is usually view-through rate combined with reply rate. With Outvid, you can generate both video variants from the same AI clone without re-recording, making video A/B testing as practical as email testing.

What is the most impactful element to test first?

Test your core value proposition before testing tactical elements. If your entire message is built around the wrong pain point or the wrong angle for your audience, no subject line optimization will save it. Once you have validated that your core message resonates, move to tactical optimizations like subject lines, CTAs, and send timing.

How do I handle tests where neither variant performs significantly better?

A null result is still a result — it tells you that the variable you tested does not meaningfully affect performance for your audience. Document the null result and move on to a more impactful variable. Roughly half of all well-designed A/B tests produce null results; this is normal and expected in rigorous experimentation.

Related resources

How to Reduce Sales Cycle Length Video vs LinkedIn InMail for Recruiters Nonprofit & NGOs Revenue Operations Manager Executive Sponsor Introduction

Run Your First Video vs. Email A/B Test

Add personalized AI video to one variant of your next outreach campaign and measure the lift. Create your AI clone on Outvid and start testing today.

Get Started