{"id":835,"date":"2026-04-17T17:00:00","date_gmt":"2026-04-17T22:00:00","guid":{"rendered":"https:\/\/tolinku.com\/blog\/?p=835"},"modified":"2026-03-07T03:33:51","modified_gmt":"2026-03-07T08:33:51","slug":"ab-test-measurement","status":"publish","type":"post","link":"https:\/\/tolinku.com\/blog\/ab-test-measurement\/","title":{"rendered":"Measuring A\/B Test Results for Deep Link Campaigns"},"content":{"rendered":"\n<p>Running an A\/B test is easy. Measuring it correctly is where most teams get it wrong. They call tests too early, measure the wrong metric, or miss confounding variables that invalidate the results.<\/p>\n\n\n\n<p>This guide covers how to properly measure A\/B tests for deep link campaigns, from choosing the right metric to knowing when a result is real.<\/p>\n\n\n\n<p><img decoding=\"async\" src=\"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/platform-ab-test-form.png\" alt=\"Tolinku A\/B test creation form with type, goal, and variant configuration\">\n<em>The A\/B test creation form with test type, goal metric, route picker, and variant builder.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What to Measure<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Primary Metric<\/h3>\n\n\n\n<p>Every A\/B test needs one primary metric that determines the winner. Not three metrics. Not &quot;whichever one looks best.&quot; One metric, defined before the test starts.<\/p>\n\n\n\n<p>For deep link tests, common primary metrics:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Test Type<\/th>\n<th>Primary Metric<\/th>\n<\/tr>\n<\/thead>\n<tbody><tr>\n<td>Link preview optimization<\/td>\n<td>Click-through rate (CTR)<\/td>\n<\/tr>\n<tr>\n<td>Landing page test<\/td>\n<td>Install rate or signup rate<\/td>\n<\/tr>\n<tr>\n<td>Onboarding flow test<\/td>\n<td>Activation rate<\/td>\n<\/tr>\n<tr>\n<td>CTA copy test<\/td>\n<td>Click-through rate<\/td>\n<\/tr>\n<tr>\n<td>Fallback page test<\/td>\n<td>Web-to-app conversion rate<\/td>\n<\/tr>\n<tr>\n<td>Referral link test<\/td>\n<td>Referral conversion rate<\/td>\n<\/tr>\n<\/tbody><\/table><\/figure>\n\n\n\n<p>Choose the metric closest to the behavior you&#39;re trying to change. If you&#39;re testing different link previews, CTR is the primary metric, not downstream purchase rate (too far removed from what you changed).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Secondary Metrics<\/h3>\n\n\n\n<p>Track 2-3 secondary metrics as guardrails. These ensure your winning variant isn&#39;t improving the primary metric at the expense of something else.<\/p>\n\n\n\n<p>Example: A link preview test with CTR as the primary metric should also track:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Install rate (are the clicks converting downstream?)<\/li>\n<li>Bounce rate on the landing page (are clicks high quality?)<\/li>\n<li>D1 retention of users who clicked (are these good users?)<\/li>\n<\/ul>\n\n\n\n<p>If variant B has 20% higher CTR but 40% lower install rate, the CTR win is misleading.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Sample Size and Duration<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Why Sample Size Matters<\/h3>\n\n\n\n<p>Small samples produce unreliable results. With 50 clicks per variant, random noise can easily make one variant look 30% better when there&#39;s no real difference. With 5,000 clicks per variant, a 5% difference is detectable and meaningful.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Calculating Required Sample Size<\/h3>\n\n\n\n<p>The sample size you need depends on:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Baseline conversion rate<\/strong>: Your current CTR or conversion rate<\/li>\n<li><strong>Minimum detectable effect (MDE)<\/strong>: The smallest improvement worth detecting<\/li>\n<li><strong>Statistical significance level<\/strong>: Typically 95% (p &lt; 0.05)<\/li>\n<li><strong>Statistical power<\/strong>: Typically 80%<\/li>\n<\/ol>\n\n\n\n<p>Rough guidelines for 95% confidence and 80% power:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Baseline Rate<\/th>\n<th>Detect 10% Relative Change<\/th>\n<th>Detect 20% Relative Change<\/th>\n<\/tr>\n<\/thead>\n<tbody><tr>\n<td>2%<\/td>\n<td>~15,000 per variant<\/td>\n<td>~4,000 per variant<\/td>\n<\/tr>\n<tr>\n<td>5%<\/td>\n<td>~6,000 per variant<\/td>\n<td>~1,500 per variant<\/td>\n<\/tr>\n<tr>\n<td>10%<\/td>\n<td>~3,000 per variant<\/td>\n<td>~800 per variant<\/td>\n<\/tr>\n<tr>\n<td>20%<\/td>\n<td>~1,500 per variant<\/td>\n<td>~400 per variant<\/td>\n<\/tr>\n<\/tbody><\/table><\/figure>\n\n\n\n<p>These are per-variant numbers. For a two-variant test, you need 2x this total.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Test Duration<\/h3>\n\n\n\n<p>Don&#39;t stop a test the moment it reaches the required sample size. Run it for at least 7 full days to account for:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Day-of-week effects<\/strong>: User behavior varies by day (weekday vs weekend)<\/li>\n<li><strong>Time-of-day effects<\/strong>: Morning clicks behave differently than evening clicks<\/li>\n<li><strong>Novelty effects<\/strong>: Early results may be inflated by novelty (especially for UI changes)<\/li>\n<\/ul>\n\n\n\n<p>A one-week minimum ensures you&#39;ve captured a full cycle of user behavior patterns.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Statistical Significance<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What It Means<\/h3>\n\n\n\n<p>Statistical significance tells you the probability that the observed difference is real, not random noise. At 95% confidence, there&#39;s only a 5% chance the difference you&#39;re seeing is due to chance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to Check<\/h3>\n\n\n\n<p>Most <a href=\"https:\/\/tolinku.com\/features\/ab-testing\">A\/B testing<\/a> platforms calculate significance automatically. If you&#39;re calculating manually, the chi-squared test works for conversion rate comparisons:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Create a 2&#215;2 contingency table (variant A\/B by converted\/not converted)<\/li>\n<li>Calculate the chi-squared statistic<\/li>\n<li>Look up the p-value<\/li>\n<li>If p &lt; 0.05, the result is statistically significant at 95% confidence<\/li>\n<\/ol>\n\n\n\n<p>Or use an online calculator. Input your sample sizes and conversion counts for each variant.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common Significance Mistakes<\/h3>\n\n\n\n<p><strong>Peeking<\/strong>: Checking results daily and stopping when p &lt; 0.05 inflates false positive rates dramatically. If you check a test 10 times during its run, your effective false positive rate can be 30-40%, not 5%.<\/p>\n\n\n\n<p><strong>Solution<\/strong>: Define the sample size and duration before starting. Check results at the end, not continuously. If you must monitor during the test, use sequential testing methods (like the Benjamini-Hochberg procedure) that account for multiple looks.<\/p>\n\n\n\n<p><strong>Multiple comparisons<\/strong>: Testing 5 variants against a control means 5 comparisons. With 95% confidence per comparison, there&#39;s a 23% chance at least one is a false positive.<\/p>\n\n\n\n<p><strong>Solution<\/strong>: Apply a correction (like Bonferroni: divide your significance threshold by the number of comparisons). Or limit tests to 2-3 variants maximum.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Interpreting Results<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Significant Winner<\/h3>\n\n\n\n<p>If variant B has a statistically significant higher primary metric and secondary metrics aren&#39;t worse, implement variant B. Document the result and the magnitude of improvement.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">No Significant Difference<\/h3>\n\n\n\n<p>If neither variant wins after reaching the required sample size, the variants perform similarly. This is a valid result, not a failure. Implement whichever variant is simpler or more maintainable.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Significant But Small Effect<\/h3>\n\n\n\n<p>A result can be statistically significant but practically insignificant. A 0.1% CTR improvement on 100,000 clicks is statistically significant but adds only 100 clicks. Consider whether the improvement is worth the implementation complexity.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conflicting Metrics<\/h3>\n\n\n\n<p>Sometimes variant B wins on the primary metric but loses on a secondary metric. This requires judgment:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>If variant B has 15% higher CTR but 5% lower install rate, it&#39;s probably still a win (the CTR gain outweighs the install rate drop in absolute terms)<\/li>\n<li>If variant B has 10% higher CTR but 30% lower revenue per user, it&#39;s probably a loss (you&#39;re attracting more low-quality clicks)<\/li>\n<\/ul>\n\n\n\n<p>Calculate the net impact on your bottom-line metric (revenue, active users, or whatever matters most).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A\/B Testing Deep Link Elements<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Link Preview Tests<\/h3>\n\n\n\n<p>Test different OG metadata on the same route:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Variant A: Product image with price in the preview<\/li>\n<li>Variant B: Lifestyle image with benefit statement<\/li>\n<\/ul>\n\n\n\n<p>Split traffic using your deep linking platform&#39;s <a href=\"https:\/\/tolinku.com\/features\/ab-testing\">A\/B testing feature<\/a> and measure CTR per variant.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Landing Page Tests<\/h3>\n\n\n\n<p>For users without the app installed, test different fallback web pages:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Variant A: Simple page with app store badges<\/li>\n<li>Variant B: Interactive preview of the app&#39;s content with a download CTA<\/li>\n<\/ul>\n\n\n\n<p>Measure install rate (clicks that result in an app install).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">CTA Copy Tests<\/h3>\n\n\n\n<p>Test different call-to-action text in emails, social posts, or landing pages:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&quot;Download free&quot; vs &quot;Start your trial&quot; vs &quot;Get the app&quot;<\/li>\n<\/ul>\n\n\n\n<p>Keep everything else identical. Measure CTR on the link.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Redirect Flow Tests<\/h3>\n\n\n\n<p>Test different redirect strategies for new users:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Variant A: Direct to app store<\/li>\n<li>Variant B: Intermediate landing page before app store<\/li>\n<\/ul>\n\n\n\n<p>Measure install rate and first-session activation rate.<\/p>\n\n\n\n<p>For more on setting up A\/B tests for deep links, see <a href=\"https:\/\/tolinku.com\/blog\/ab-testing-deep-links-landing-pages\/\">A\/B Testing Deep Links and Landing Pages<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Documenting and Learning<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Test Log<\/h3>\n\n\n\n<p>Maintain a test log that records:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table>\n<thead>\n<tr>\n<th>Field<\/th>\n<th>Example<\/th>\n<\/tr>\n<\/thead>\n<tbody><tr>\n<td>Test name<\/td>\n<td>Link preview image test &#8211; Summer campaign<\/td>\n<\/tr>\n<tr>\n<td>Hypothesis<\/td>\n<td>Product images drive higher CTR than lifestyle images<\/td>\n<\/tr>\n<tr>\n<td>Primary metric<\/td>\n<td>CTR<\/td>\n<\/tr>\n<tr>\n<td>Secondary metrics<\/td>\n<td>Install rate, D1 retention<\/td>\n<\/tr>\n<tr>\n<td>Start date<\/td>\n<td>2026-04-15<\/td>\n<\/tr>\n<tr>\n<td>End date<\/td>\n<td>2026-04-22<\/td>\n<\/tr>\n<tr>\n<td>Sample size (A\/B)<\/td>\n<td>5,200 \/ 5,100<\/td>\n<\/tr>\n<tr>\n<td>Result<\/td>\n<td>Variant B won: 4.8% vs 3.9% CTR (p = 0.012)<\/td>\n<\/tr>\n<tr>\n<td>Decision<\/td>\n<td>Implement variant B for all summer campaign links<\/td>\n<\/tr>\n<tr>\n<td>Notes<\/td>\n<td>Install rate was not significantly different between variants<\/td>\n<\/tr>\n<\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Building Institutional Knowledge<\/h3>\n\n\n\n<p>Over time, your test log becomes a knowledge base:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&quot;Lifestyle images consistently outperform product images in social media previews&quot;<\/li>\n<li>&quot;Shorter CTAs (under 4 words) perform better in push notifications&quot;<\/li>\n<li>&quot;Users who land on personalized deep link destinations retain 2x better than generic landing pages&quot;<\/li>\n<\/ul>\n\n\n\n<p>This knowledge prevents re-testing things you&#39;ve already proven and helps new team members understand what works.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Common Testing Mistakes<\/h2>\n\n\n\n<p><strong>No hypothesis<\/strong>: Running a test without a clear hypothesis means you won&#39;t know how to interpret the result. Always start with &quot;We believe [change] will improve [metric] because [reason].&quot;<\/p>\n\n\n\n<p><strong>Testing too many things at once<\/strong>: Changing the image, title, description, and CTA simultaneously means you don&#39;t know which change caused the result. Test one variable at a time.<\/p>\n\n\n\n<p><strong>Stopping too early<\/strong>: Seeing variant B ahead after 200 clicks doesn&#39;t mean it&#39;s the winner. Wait for statistical significance and the minimum test duration.<\/p>\n\n\n\n<p><strong>Ignoring segment differences<\/strong>: A variant might win overall but lose for a specific segment (iOS users, users from a particular channel). Check segment-level results before rolling out broadly.<\/p>\n\n\n\n<p><strong>Not implementing winners<\/strong>: A surprising number of teams run tests, find winners, and never implement them. Build a process for acting on test results.<\/p>\n\n\n\n<p>For a comprehensive view of analytics for deep links, see <a href=\"https:\/\/tolinku.com\/blog\/deep-link-analytics-measuring-what-matters\/\">Deep Link Analytics: Measuring What Matters<\/a>. For funnel analysis that complements A\/B testing, see <a href=\"https:\/\/tolinku.com\/blog\/conversion-funnel-analysis-deep-links\/\">Conversion Funnel Analysis for Deep Links<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Measure A\/B test results correctly for deep link experiments. Understand statistical significance, sample sizes, and when to call a test.<\/p>\n","protected":false},"author":2,"featured_media":834,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"rank_math_title":"Measuring A\/B Test Results for Deep Link Campaigns","rank_math_description":"Measure A\/B test results correctly for deep link experiments. Understand statistical significance, sample sizes, and when to call a test.","rank_math_focus_keyword":"A\/B test measurement","rank_math_canonical_url":"","rank_math_facebook_title":"","rank_math_facebook_description":"","rank_math_facebook_image":"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/og-ab-test-measurement.png","rank_math_facebook_image_id":"","rank_math_twitter_title":"","rank_math_twitter_description":"","rank_math_twitter_image":"https:\/\/tolinku.com\/blog\/wp-content\/uploads\/2026\/03\/og-ab-test-measurement.png","footnotes":""},"categories":[14],"tags":[60,37,38,39,20,29],"class_list":["post-835","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","tag-ab-testing","tag-analytics","tag-campaign-tracking","tag-conversion","tag-deep-linking","tag-mobile-marketing"],"_links":{"self":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/835","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/comments?post=835"}],"version-history":[{"count":2,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/835\/revisions"}],"predecessor-version":[{"id":2165,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/posts\/835\/revisions\/2165"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/media\/834"}],"wp:attachment":[{"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/media?parent=835"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/categories?post=835"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/tolinku.com\/blog\/wp-json\/wp\/v2\/tags?post=835"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}