Context:
- When developing a new version of an ML Model, we want to test it in a real use-case with real users, and compare it with the the previous model (given a metrics of success or failure)
- A/B Testing is the process of dividing the users into groups and serving each group a different model while tracking evaluation metrics
- eg. with ads we track click-rate in each
- For n users small:
- Inconclusive Results: The evaluation metrics are not accurate enough to make conclusive decisions
- For n users large:
- Lost Opportunity: We would have allocated too many users in the group with the less performant model (leading to potential performance/ financial losses)
- Exploration: Testing different alternatives to gather enough data for a reliable conclusion.
- Exploitation: Focusing on the best-performing variant to maximize outcomes (e.g., clicks).
A/B Testing

Diagram Code
flowchart TD A[Start A/B Testing] --> B[Randomly Split Users into Two Groups] B --> C1[Group A: Show Variant A] B --> C2[Group B: Show Variant B] C1 --> D1[Collect Data: Clicks, Impressions, etc.] C2 --> D2[Collect Data: Clicks, Impressions, etc.] D1 --> E1[Calculate Conversion Rate for Variant A] D2 --> E2[Calculate Conversion Rate for Variant B] E1 --> F[Compare Conversion Rates] E2 --> F[Compare Conversion Rates] F --> G[Determine Winning Variant] G --> H[Deploy Winning Variant to All Users] H --> I[End A/B Testing]
Problem with A/B Testing
This is an example of Exploration-Exploitation Tradeoff problem
Trade-off Between Exploration and Exploitation
Solution: Thompson Sampling
- Thompson Sampling uses full probability distributions to represent the uncertainty of each variant's performance
- Exploration: When there is high uncertainty, the system explores by randomly selecting different variants to show to users.
- Exploitation: As confidence in the data increases, the system increasingly favors the best-performing variant, thus optimizing the outcome.

Diagram Code
flowchart TD A[Start Thompson Sampling] --> B[Initialize Beta Distribution for Each Variant] B --> C[For Each User, Draw a Sample from Each Beta Distribution] C --> D[Select the Variant with the Highest Sample Value] D --> E[Show the Selected Variant to the User] E --> F[Record User's Interaction Click or Miss] F --> G[Update Beta Distribution of the Selected Variant Based on Interaction] G --> H{Continue Experiment?} H -->|Yes| C H -->|No| I[Pause or Restart Experiment]
Thompson Sampling Algorithm (eg. Ads, eval metric: click rate)
1. Initialize Distributions: For each variant (e.g., ads), start with a Beta distribution, which is defined by two parameters: π (number of successes, e.g., clicks) and π (number of failures, e.g., misses). 2. Draw Samples: For each user, draw a random sample from the Beta distribution of each variant. 3. Select Variant: Choose the variant with the highest sampled value and show it to the user. 3. Update Distribution: After the user interacts (click or miss), update the corresponding Beta distribution by increasing π for a success or π for a failure. 4. Repeat: Continue this process, with the system gradually focusing more on the variant that shows the highest probability of success.
Notes:
- For this example we use Beta-distribution since we deal with success and failures, but for each use-case we choose a suitable distribution
- In general, we can default to using the Normal distribution (here with sample mean being the click rate ()
- Reason: CLT (Central Limit Theorem)
- As number of i.i.d r.vβs increases the distribution of their sample mean will be approximately normally distributed
Β
Key Benefits of Thompson Sampling
- Adaptive Learning: The method dynamically adjusts to new data, reducing the inefficiencies inherent in static A/B testing.
- Better Performance: In the example provided in the article (Ads selection), Thompson Sampling achieved 7.4% more clicks than a traditional A/B test after 1,000 iterations, demonstrating its efficiency in real-time optimization.