{"id":1103,"date":"2026-05-17T09:00:00","date_gmt":"2026-05-17T14:00:00","guid":{"rendered":"https:\/\/tolinku.com\/blog\/?p=1103"},"modified":"2026-03-07T03:34:43","modified_gmt":"2026-03-07T08:34:43","slug":"bayesian-ab-testing","status":"publish","type":"post","link":"https:\/\/tolinku.com\/blog\/bayesian-ab-testing\/","title":{"rendered":"Bayesian A\/B Testing for Mobile Experiments"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">You run an A\/B test on two deep link destinations. After three days, Variant B has a 6.1% conversion rate versus Variant A&#39;s 5.4%. Your frequentist test says &quot;not significant&quot; because you haven&#39;t hit the required sample size yet. But you need to make a decision by Friday.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is where Bayesian A\/B testing changes the game. Instead of a binary &quot;significant or not&quot; answer, it tells you: &quot;There is an 87% probability that Variant B is better, and the expected upside is 0.6 percentage points.&quot; That&#39;s a statement you can actually act on.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For foundational A\/B testing concepts, see <a href=\"https:\/\/tolinku.com\/blog\/ab-testing-deep-links-landing-pages\/\">A\/B Testing Deep Links and Landing Pages<\/a>. For the frequentist approach, see <a href=\"https:\/\/tolinku.com\/blog\/statistical-significance-ab-tests\/\">Statistical Significance for A\/B Tests<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/platform-ab-tests.png\" alt=\"Tolinku A\/B testing dashboard for smart banners\">\n<em>The A\/B tests list page showing test names, status, types, and variant counts.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Bayesian vs. Frequentist: The Core Difference<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Frequentist A\/B testing asks: &quot;If there were no real difference, how likely is this data?&quot; It produces a p-value, which is the probability of seeing your results (or more extreme results) under the assumption that both variants are identical. If that probability is below a threshold (usually 5%), you reject the null hypothesis.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Bayesian A\/B testing asks a more direct question: &quot;Given the data I&#39;ve observed, what&#39;s the probability that Variant B is better than Variant A?&quot; It produces a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Posterior_probability\" rel=\"nofollow noopener\" target=\"_blank\">posterior distribution<\/a> for each variant&#39;s true conversion rate, then compares them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The practical differences matter:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th><\/th>\n<th>Frequentist<\/th>\n<th>Bayesian<\/th>\n<\/tr>\n<\/thead>\n<tbody><tr>\n<td><strong>Output<\/strong><\/td>\n<td>p-value, confidence interval<\/td>\n<td>Probability of being best, credible interval<\/td>\n<\/tr>\n<tr>\n<td><strong>Interpretation<\/strong><\/td>\n<td>&quot;Reject or fail to reject the null&quot;<\/td>\n<td>&quot;82% chance B is better&quot;<\/td>\n<\/tr>\n<tr>\n<td><strong>Sample size<\/strong><\/td>\n<td>Must be fixed in advance<\/td>\n<td>Can check results anytime<\/td>\n<\/tr>\n<tr>\n<td><strong>Early stopping<\/strong><\/td>\n<td>Inflates false positive rate<\/td>\n<td>Built-in; posterior updates continuously<\/td>\n<\/tr>\n<tr>\n<td><strong>Prior knowledge<\/strong><\/td>\n<td>Not incorporated<\/td>\n<td>Can encode prior beliefs<\/td>\n<\/tr>\n<\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">For mobile deep link experiments, the Bayesian approach has a significant practical advantage: mobile traffic is often limited, campaigns are short-lived, and you need answers fast. Bayesian methods let you make informed decisions with less data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Beta-Binomial Model<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For conversion rate experiments (did the user convert or not?), the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Beta-binomial_distribution\" rel=\"nofollow noopener\" target=\"_blank\">Beta-Binomial model<\/a> is the standard Bayesian approach. It works because conversions are binary outcomes, and the Beta distribution is a natural fit for modeling probabilities.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How It Works<\/h3>\n\n\n\n<ol class=\"wp-block-list\">\n<li><p><strong>Start with a prior.<\/strong> Before seeing any data, you express your belief about the conversion rate using a Beta distribution. A common uninformative prior is Beta(1, 1), which says &quot;any conversion rate between 0% and 100% is equally likely.&quot;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Observe data.<\/strong> You collect conversions (successes) and non-conversions (failures) for each variant.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Compute the posterior.<\/strong> The posterior distribution is also a Beta distribution (this is what makes the model elegant). If your prior is Beta(alpha, beta) and you observe <code>s<\/code> successes and <code>f<\/code> failures, the posterior is Beta(alpha + s, beta + f).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, if Variant A gets 54 conversions out of 1,000 visitors:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prior: Beta(1, 1)<\/li>\n<li>Posterior: Beta(1 + 54, 1 + 946) = Beta(55, 947)<\/li>\n<li>The posterior mean is 55 \/ (55 + 947) = 5.49%<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The posterior captures your full uncertainty about the true conversion rate. It&#39;s not a single number; it&#39;s a distribution showing which values are plausible and how plausible each one is.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Computing Key Metrics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Once you have posterior distributions for each variant, you can compute everything you need to make a decision.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Probability of Being Best<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This is the headline metric: &quot;What&#39;s the probability that Variant B has a higher true conversion rate than Variant A?&quot; You compute it by sampling from both posteriors and counting how often B beats A.<\/p>\n\n\n\n<pre><code class=\"language-javascript\">function probabilityBIsBest(alphaA, betaA, alphaB, betaB, samples = 100000) {\n  let bWins = 0;\n\n  for (let i = 0; i &lt; samples; i++) {\n    const sampleA = betaSample(alphaA, betaA);\n    const sampleB = betaSample(alphaB, betaB);\n    if (sampleB &gt; sampleA) bWins++;\n  }\n\n  return bWins \/ samples;\n}\n\n\/\/ Beta distribution sampling using the Joehnk method\nfunction betaSample(alpha, beta) {\n  const gammaA = gammaSample(alpha);\n  const gammaB = gammaSample(beta);\n  return gammaA \/ (gammaA + gammaB);\n}\n\nfunction gammaSample(shape) {\n  \/\/ Marsaglia and Tsang&#39;s method\n  if (shape &lt; 1) {\n    return gammaSample(shape + 1) * Math.pow(Math.random(), 1 \/ shape);\n  }\n  const d = shape - 1 \/ 3;\n  const c = 1 \/ Math.sqrt(9 * d);\n  while (true) {\n    let x, v;\n    do {\n      x = randn();\n      v = 1 + c * x;\n    } while (v &lt;= 0);\n    v = v * v * v;\n    const u = Math.random();\n    if (u &lt; 1 - 0.0331 * (x * x) * (x * x)) return d * v;\n    if (Math.log(u) &lt; 0.5 * x * x + d * (1 - v + Math.log(v))) return d * v;\n  }\n}\n\nfunction randn() {\n  const u1 = Math.random();\n  const u2 = Math.random();\n  return Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);\n}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">With 1,000 visitors per variant and conversion rates of 5.4% vs. 6.1%, this might return 0.82, meaning there&#39;s an 82% probability that Variant B is genuinely better.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Expected Loss<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Probability of being best alone isn&#39;t enough. You also want to know: &quot;If I pick B and I&#39;m wrong, how much do I lose?&quot; This is the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Bayes_estimator\" rel=\"nofollow noopener\" target=\"_blank\">expected loss<\/a> (also called risk).<\/p>\n\n\n\n<pre><code class=\"language-javascript\">function expectedLoss(alphaA, betaA, alphaB, betaB, samples = 100000) {\n  let lossA = 0;\n  let lossB = 0;\n\n  for (let i = 0; i &lt; samples; i++) {\n    const sampleA = betaSample(alphaA, betaA);\n    const sampleB = betaSample(alphaB, betaB);\n\n    \/\/ Loss of choosing A when B might be better\n    lossA += Math.max(0, sampleB - sampleA);\n    \/\/ Loss of choosing B when A might be better\n    lossB += Math.max(0, sampleA - sampleB);\n  }\n\n  return {\n    lossIfChooseA: lossA \/ samples,\n    lossIfChooseB: lossB \/ samples,\n  };\n}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Expected loss gives you a risk threshold. You might decide: &quot;I&#39;ll declare a winner when the expected loss of choosing them is below 0.1 percentage points.&quot; This is a more nuanced stopping rule than &quot;wait for p &lt; 0.05.&quot;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Credible Intervals<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Bayesian credible intervals are the intuitive analog of confidence intervals, but they mean what you probably thought confidence intervals meant all along.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A 95% credible interval for a conversion rate means: &quot;There is a 95% probability that the true conversion rate falls within this range.&quot; (A 95% confidence interval does not mean this, despite common misconception. It means that 95% of such intervals, constructed from repeated experiments, would contain the true value.)<\/p>\n\n\n\n<pre><code class=\"language-javascript\">function credibleInterval(alpha, beta, level = 0.95) {\n  const lower = (1 - level) \/ 2;\n  const upper = 1 - lower;\n\n  return {\n    lower: betaQuantile(alpha, beta, lower),\n    median: betaQuantile(alpha, beta, 0.5),\n    upper: betaQuantile(alpha, beta, upper),\n  };\n}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">For a variant with Beta(55, 947) posterior, the 95% credible interval might be [4.2%, 7.0%]. You can say with 95% probability that the true conversion rate is between 4.2% and 7.0%.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Choosing a Prior<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The prior encodes what you know before the experiment starts. For most deep link experiments, you have three reasonable options:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Uninformative prior: Beta(1, 1).<\/strong> This is the uniform distribution. It says you have no prior knowledge. Use this when you&#39;re testing something genuinely new and have no baseline data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Weakly informative prior: Beta(2, 38).<\/strong> This encodes a belief that the conversion rate is &quot;probably around 5%&quot; without being too confident. It&#39;s equivalent to having seen 2 conversions out of 40 visitors in a hypothetical previous experiment. Use this when you have a rough sense of your baseline but want the data to dominate quickly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Empirical prior.<\/strong> If you&#39;ve run similar experiments before, use that data. If your deep link routes typically convert at 4-7%, you could set a prior like Beta(10, 170), which centers around 5.6% with moderate confidence. The data will still override the prior as observations accumulate.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, the prior matters less than you might think. With even a few hundred observations per variant, the posterior is almost entirely determined by the data. Priors only matter in very small samples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">When to Use Bayesian vs. Frequentist<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Bayesian A\/B testing is not universally better. Each approach has its strengths.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Use Bayesian when:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Your traffic is limited (fewer than 5,000 visitors per variant)<\/li>\n<li>You need to make decisions quickly, before reaching a traditional sample size<\/li>\n<li>You want intuitive probability statements for stakeholders (&quot;78% chance B is better&quot;)<\/li>\n<li>You&#39;re running continuous experiments and checking results regularly<\/li>\n<li>You want to incorporate prior knowledge from previous experiments<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Use frequentist when:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>You have high traffic and can afford to wait for fixed sample sizes<\/li>\n<li>Regulatory or organizational standards require p-values and confidence intervals<\/li>\n<li>You need results that are straightforward to audit and reproduce<\/li>\n<li>You&#39;re running a one-off test with a clear stopping point<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For mobile deep link experiments on <a href=\"https:\/\/tolinku.com\/features\/ab-testing\">Tolinku<\/a>, Bayesian methods are often the better fit. Campaign windows are finite, mobile traffic splits across platforms and devices, and product teams want actionable answers, not abstract statistical statements.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A Complete Bayesian Test Workflow<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Here&#39;s how a Bayesian A\/B test flows for a deep link experiment:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><p><strong>Define the experiment.<\/strong> You&#39;re testing two deep link destinations: Variant A sends users to the app home screen; Variant B sends them to a personalized recommendations page.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Set the prior.<\/strong> Based on previous campaigns, you know your baseline conversion rate is around 5%. You set Beta(5, 95) as the prior for both variants.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Run the experiment.<\/strong> Traffic splits 50\/50. After three days, Variant A has 48 conversions from 900 visitors. Variant B has 63 conversions from 920 visitors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Compute posteriors.<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Variant A: Beta(5 + 48, 95 + 852) = Beta(53, 947). Mean: 5.3%<\/li>\n<li>Variant B: Beta(5 + 63, 95 + 857) = Beta(68, 952). Mean: 6.7%<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Evaluate.<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Probability B is best: 91.2%<\/li>\n<li>Expected loss of choosing B: 0.04 percentage points<\/li>\n<li>Expected loss of choosing A: 1.1 percentage points<\/li>\n<li>95% credible interval for lift: [0.1%, 2.7%]<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Decide.<\/strong> With a 91% probability that B is better and an expected loss below your threshold (say, 0.1 pp), you can confidently deploy Variant B.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The key advantage here is that you didn&#39;t need to wait for a predetermined sample size. You evaluated the evidence as it came in, and the math accounted for the uncertainty at every step.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Best Practices<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Set a loss threshold, not a probability threshold.<\/strong> &quot;Probability of being best &gt; 95%&quot; sounds rigorous, but it ignores magnitude. A variant that&#39;s 0.01% better will eventually reach 95% probability with enough data. Expected loss (e.g., &quot;less than 0.1 pp&quot;) ensures you only act when the decision actually matters.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Run tests for at least one full business cycle.<\/strong> Even with Bayesian methods, day-of-week effects can skew results. Running for at least seven days ensures your posterior reflects the full pattern of user behavior.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Don&#39;t ignore practical significance.<\/strong> A 92% probability that Variant B is better sounds compelling. But if the expected improvement is 0.05 percentage points, it might not be worth the engineering effort to deploy. Always pair statistical evidence with business impact.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Use the same prior for all variants.<\/strong> If you give one variant a more optimistic prior, you&#39;re biasing the comparison. Start every variant from the same prior and let the data differentiate them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Document your priors and thresholds.<\/strong> Before starting the test, write down your prior choice, your loss threshold, and your decision criteria. This prevents post-hoc rationalization and keeps your experimentation process honest.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For more on running effective experiments, see <a href=\"https:\/\/tolinku.com\/blog\/ab-testing-deep-links-landing-pages\/\">A\/B Testing Deep Links and Landing Pages<\/a> and the <a href=\"https:\/\/tolinku.com\/features\/ab-testing\">Tolinku A\/B testing documentation<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Use Bayesian methods for A\/B testing deep link experiments. Get intuitive probability statements, handle small samples, and make faster decisions.<\/p>\n","protected":false},"author":2,"featured_media":1102,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"rank_math_title":"Bayesian A\/B Testing for Mobile Experiments","rank_math_description":"Use Bayesian methods for A\/B testing deep link experiments. Get intuitive probability statements, handle small samples, and make faster decisions.","rank_math_focus_keyword":"Bayesian A\/B testing","rank_math_canonical_url":"","rank_math_facebook_title":"","rank_math_facebook_description":"","rank_math_facebook_image":"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/og-bayesian-ab-testing.png","rank_math_facebook_image_id":"","rank_math_twitter_title":"","rank_math_twitter_description":"","rank_math_twitter_image":"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/og-bayesian-ab-testing.png","footnotes":""},"categories":[13],"tags":[60,37,191,20,225,69,256,258],"class_list":["post-1103","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-growth","tag-ab-testing","tag-analytics","tag-conversions","tag-deep-linking","tag-experimentation","tag-mobile-development","tag-optimization","tag-statistics"],"_links":{"self":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/1103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/comments?post=1103"}],"version-history":[{"count":2,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/1103\/revisions"}],"predecessor-version":[{"id":2245,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/1103\/revisions\/2245"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/media\/1102"}],"wp:attachment":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/media?parent=1103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/categories?post=1103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/tags?post=1103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}