A/B Testing vs Thompson Sampling

Context:

When developing a new version of an ML Model, we want to test it in a real use-case with real users, and compare it with the the previous model (given a metrics of success or failure)

A/B Testing

A/B Testing is the process of dividing the users into groups and serving each group a different model while tracking evaluation metrics

eg. with ads we track click-rate in each

Diagram Code


flowchart TD
    A[Start A/B Testing] --> B[Randomly Split Users into Two Groups]
    B --> C1[Group A: Show Variant A]
    B --> C2[Group B: Show Variant B]
    C1 --> D1[Collect Data: Clicks, Impressions, etc.]
    C2 --> D2[Collect Data: Clicks, Impressions, etc.]
    D1 --> E1[Calculate Conversion Rate for Variant A]
    D2 --> E2[Calculate Conversion Rate for Variant B]
    E1 --> F[Compare Conversion Rates]
    E2 --> F[Compare Conversion Rates]
    F --> G[Determine Winning Variant]
    G --> H[Deploy Winning Variant to All Users]
    H --> I[End A/B Testing]

Problem with A/B Testing

For n users small:

Inconclusive Results: The evaluation metrics are not accurate enough to make conclusive decisions

For n users large:

Lost Opportunity: We would have allocated too many users in the group with the less performant model (leading to potential performance/ financial losses)

💡

This is an example of Exploration-Exploitation Tradeoff problem

Trade-off Between Exploration and Exploitation

Exploration: Testing different alternatives to gather enough data for a reliable conclusion.

Exploitation: Focusing on the best-performing variant to maximize outcomes (e.g., clicks).

Solution: Thompson Sampling

Thompson Sampling uses full probability distributions to represent the uncertainty of each variant's performance

Exploration: When there is high uncertainty, the system explores by randomly selecting different variants to show to users.
Exploitation: As confidence in the data increases, the system increasingly favors the best-performing variant, thus optimizing the outcome.

Diagram Code


flowchart TD
    A[Start Thompson Sampling] --> B[Initialize Beta Distribution for Each Variant]
    B --> C[For Each User, Draw a Sample from Each Beta Distribution]
    C --> D[Select the Variant with the Highest Sample Value]
    D --> E[Show the Selected Variant to the User]
    E --> F[Record User's Interaction Click or Miss]
    F --> G[Update Beta Distribution of the Selected Variant Based on Interaction]
    G --> H{Continue Experiment?}
    H -->|Yes| C
    H -->|No| I[Pause or Restart Experiment]

Thompson Sampling Algorithm (eg. Ads, eval metric: click rate)


1. Initialize Distributions:
For each variant (e.g., ads), start with a Beta distribution, which is defined by two parameters: 
𝑎 (number of successes, e.g., clicks) and 𝑏 (number of failures, e.g., misses).

2. Draw Samples:
For each user, draw a random sample from the Beta distribution of each variant.

3. Select Variant:
Choose the variant with the highest sampled value and show it to the user.

3. Update Distribution:
After the user interacts (click or miss), update the corresponding Beta 
distribution by increasing 𝑎 for a success or 𝑏 for a failure.

4. Repeat:
Continue this process, with the system gradually focusing more on the variant that shows the highest probability of success.

Notes:

For this example we use Beta-distribution since we deal with success and failures, but for each use-case we choose a suitable distribution

In general, we can default to using the Normal distribution (here with sample mean being the click rate ()

Reason: CLT (Central Limit Theorem)

As number of i.i.d r.v’s increases the distribution of their sample mean will be approximately normally distributed

Key Benefits of Thompson Sampling

Adaptive Learning: The method dynamically adjusts to new data, reducing the inefficiencies inherent in static A/B testing.

Better Performance: In the example provided in the article (Ads selection), Thompson Sampling achieved 7.4% more clicks than a traditional A/B test after 1,000 iterations, demonstrating its efficiency in real-time optimization.