A/B Testing vs Thompson Sampling

A/B Testing vs Thompson Sampling

Context:

  • When developing a new version of an ML Model, we want to test it in a real use-case with real users, and compare it with the the previous model (given a metrics of success or failure)
    • A/B Testing

    • A/B Testing is the process of dividing the users into groups and serving each group a different model while tracking evaluation metrics
      • eg. with ads we track click-rate in each
      • notion image
        Diagram Code
        flowchart TD A[Start A/B Testing] --> B[Randomly Split Users into Two Groups] B --> C1[Group A: Show Variant A] B --> C2[Group B: Show Variant B] C1 --> D1[Collect Data: Clicks, Impressions, etc.] C2 --> D2[Collect Data: Clicks, Impressions, etc.] D1 --> E1[Calculate Conversion Rate for Variant A] D2 --> E2[Calculate Conversion Rate for Variant B] E1 --> F[Compare Conversion Rates] E2 --> F[Compare Conversion Rates] F --> G[Determine Winning Variant] G --> H[Deploy Winning Variant to All Users] H --> I[End A/B Testing]

      Problem with A/B Testing

    • For n users small:
      • Inconclusive Results: The evaluation metrics are not accurate enough to make conclusive decisions
    • For n users large:
      • Lost Opportunity: We would have allocated too many users in the group with the less performant model (leading to potential performance/ financial losses)
      πŸ’‘
      This is an example of Exploration-Exploitation Tradeoff problem
      Trade-off Between Exploration and Exploitation
      • Exploration: Testing different alternatives to gather enough data for a reliable conclusion.
      • Exploitation: Focusing on the best-performing variant to maximize outcomes (e.g., clicks).

Solution: Thompson Sampling

  • Thompson Sampling uses full probability distributions to represent the uncertainty of each variant's performance
    • Exploration: When there is high uncertainty, the system explores by randomly selecting different variants to show to users.
    • Exploitation: As confidence in the data increases, the system increasingly favors the best-performing variant, thus optimizing the outcome.
      • notion image
        Diagram Code
        flowchart TD A[Start Thompson Sampling] --> B[Initialize Beta Distribution for Each Variant] B --> C[For Each User, Draw a Sample from Each Beta Distribution] C --> D[Select the Variant with the Highest Sample Value] D --> E[Show the Selected Variant to the User] E --> F[Record User's Interaction Click or Miss] F --> G[Update Beta Distribution of the Selected Variant Based on Interaction] G --> H{Continue Experiment?} H -->|Yes| C H -->|No| I[Pause or Restart Experiment]

Thompson Sampling Algorithm (eg. Ads, eval metric: click rate)

1. Initialize Distributions: For each variant (e.g., ads), start with a Beta distribution, which is defined by two parameters: π‘Ž (number of successes, e.g., clicks) and 𝑏 (number of failures, e.g., misses). 2. Draw Samples: For each user, draw a random sample from the Beta distribution of each variant. 3. Select Variant: Choose the variant with the highest sampled value and show it to the user. 3. Update Distribution: After the user interacts (click or miss), update the corresponding Beta distribution by increasing π‘Ž for a success or 𝑏 for a failure. 4. Repeat: Continue this process, with the system gradually focusing more on the variant that shows the highest probability of success.
Notes:
  • For this example we use Beta-distribution since we deal with success and failures, but for each use-case we choose a suitable distribution
  • In general, we can default to using the Normal distribution (here with sample mean being the click rate ()
    • Reason: CLT (Central Limit Theorem)
      • As number of i.i.d r.v’s increases the distribution of their sample mean will be approximately normally distributed
Β 

Key Benefits of Thompson Sampling

  • Adaptive Learning: The method dynamically adjusts to new data, reducing the inefficiencies inherent in static A/B testing.
  • Better Performance: In the example provided in the article (Ads selection), Thompson Sampling achieved 7.4% more clicks than a traditional A/B test after 1,000 iterations, demonstrating its efficiency in real-time optimization.