Annotary
Sort
Three Dimensions of Analytics and Optimization

Three Dimensions of Analytics and Optimization

clearhead.me
Annotary Research Annotary Research
3 years ago
  • Performance Measurement — there is tremendous value in doing this well. It means capturing the right key performance indicators (KPIs), establishing targets for them, putting them onto a dashboard (with a few contextual/supporting metrics — emphasis on “few”), and automating the bejeezus out of it. Low latency data, cleanly structured, with big, blaring alert indicators for any KPI that is going awry. And very little on-going human tasks to maintain it.
  • Hypothesis Validation — this is where “analysis” occurs. It requires having a clearly articulated idea and qualification of that idea up front to confirm that, if the idea is validated (through historical data analysis, A/B testing, primary or secondary research, or some combination), action will be taken.
  • Support — this covers a smorgasbord of activity: end-user training, site tagging, quick data pulls, data governance, and so on. It’s all of the tasks that support the first two items (we have versions of the diagram that breaks these functions out into separate boxes…but they’re all lumped together here for a reason).

Cancel
Sort
Hey, digital marketer -- you've got a new job.

Hey, digital marketer -- you've got a new job.

clearhead.me
Annotary Research Annotary Research
3 years ago
Beginning many years ago with concepts of cost per impressions, cost per click, like counts or reach, email open or click rates, search rank, etc., marketers have always been tasked with optimizing the value they are getting from their campaigns. So, to imply that digital marketers are not data driven would be naive. However, digital marketers are often accused or allowing creative instincts or their “gut” drive digital design or experience decisions.
Cancel

So, here’s how marketers can help and, in the process, become more connected to and driven by data:

1. Marketers should help define KPIs so they can connect their media and crm campaigns to the entire funnel, including conversion success

2. Marketers should be the leading source of hypotheses for testing and analysis. These hypotheses should come from their sense of brand, the market, design and customer relationships. 

3.  Marketers should help prioritize the hypotheses based on high level goals and themes they are pursuing for the brand and business.

4. Marketers should work with the analysts to help articulate and present the insights back to the business so that change is most likely to be adopted.

5. Marketers should become great customers for their analysts, defining requirements at the outset and providing context that might not be apparent, including events and promotions that could taint data.

In short, marketers should increasingly think of themselves as optimizers.

Cancel
Show all 2 highlights
Sort
Gilligan’s Unified Theory of Analytics (Requests) | Gilligan on Data by Tim Wilson

Gilligan’s Unified Theory of Analytics (Requests) | Gilligan on Data by Tim Wilson

www.gilliganondata.com
Annotary Research Annotary Research
3 years ago
  • A misperception that “getting the data” is the first step in any analysis — a belief that surprising and actionable insights will pretty much emerge automagically once the raw data is obtained.
  • A lack of clarity on the different types and purposes of analytics requests — this is an education issue (and an education that has to be 80% “show” and 20% “tell”)
  • Cancel
    • Analysts need to build strong partnerships with their business stakeholders
    • Analysts have to focus on delivering business value rather than just delivering analysis
    • Analysts have to stop “presenting data” and, instead “effectively communicate actionable data-informed stories.”

    All of these are 100% true! But, that’s a focus on how the analyst should develop their own skills, and this post is more of a process-oriented one.

    Cancel
    Hands-down, testing and validation of hypotheses is the sexiest and, if done well, highest value way for an analyst to contribute to their organization.
    Cancel

    The bitch when it comes to getting really good hypotheses is that “hypothesis” is not a word that marketers jump up and down with excitement over. Here’s how I’m starting to work around that: by asking business users to frame their testing and analysis requests in two parts:

    Part 1: “I believe…[some idea]”

    Part 2: “If I am right, we will…[take some action]“

    This construct does a couple of things:

    Cancel
  • It forces some clarity around the idea or question. Even if the requestor says, “Look. I really have NO IDEA if it’s ‘A’ or ‘B’!” you can respond with, “It doesn’t really matter. Pick one and articulate what you will do if that one is true. If you wouldn’t do anything different if that one is true, then pick the other one.”
  • It forces a little bit of thought on the part of the requestor as to the actionability of the analysis.
  • Cancel

    Analysts can provide a lot of value by setting up automated (or near-automated) performance measurement dashboards and reports. These are recurring (hypothesis testing is not — once you test a hypothesis, you don’t need to keep retesting it unless you make some change that makes sense to do so).

    Cancel
    here’s a gross misperception when it comes to “quick” requests that there is a strong correlation between the amount of time required to make the request and the amount of time required to fulfill the request. Whenever someone tells me they have a “quick question,” I playfully warn them that the length of the question tends to be inversely correlated to the time and effort required to provide an answer.
    Cancel

    irst, there is how the request should be structured — the information I try to grab as the request comes in:

    • The basics – who is making the request and when the data is needed; you can even include a “priority” field…the rest of the request info should help vet out if that priority is accurate.
    • A brief (255 characters or so) articulation of the request — if it can’t be articulated briefly, it probably falls into one of the other two categories above. OR…it’s actually a dozen “quick requests” trying to be lumped together into a single one. (Wag your finger. Say “Tsk, tsk!”
    • An identification of what the request will be used forthere are basically three options, and, behind the scenes, those options are an indication as to the value and priority of the request:
      • General information — Low Value (“I’m curious,” “It would be be interesting — but not necessarily actionable — to know…”)
      • To aid with hypothesis development — Medium Value (“I have an idea about SEO-driven visitors who reach our shopping cart, but I want to know how many visits fall into that segment before I flesh it out.”)
      • To make a specific decision — High Value
    • The timeframe to be included in the data — it’s funny how often requests come in that want some simple metric…but don’t say for when!
    • The actual data details — this can be a longer field; ideally, it would be in “dimensions and metrics” terminology…but that’s a bit much to ask for many requestors to understand.
    • Desired delivery format — a multi-select with several options:
      • Raw data in Excel
      • Visualized summary in Excel
      • Presentation-ready slides
      • Documentation on how to self-service similar data pulls in the future
    Cancel

    he next step is to actually assess the request. This is the sort of thing, generally, an analyst needs to do, and it covers two main areas:

    • Is the request clear? If not, then some follow-up with the requestor is required (ideally, a system that allows this to happen as comments or a discussion linked to the original request is ideal — Jira, Sharepoint, Lotus Notes, etc.)
    • What will the effort be to pull the data? This can be a simple High/Medium/Low with hours ranges assigned as they make sense to each classification.
    Cancel

    If the analytics and optimization organization is framed across these three main types of services, then conscious investment decisions can be made:

    • What is the maximum % of the analytics program cost that should be devoted to Quick Data Requests? Hopefully, not much (20-25%?).
    • How much to performance measurement? Also, hopefully, not much — this may require some investment in automation tools, but once smart analysts are involved in defining and designing the main dashboards and reports, that is work that should be automated. Analysts are too scarce for them to be doing weekly or monthly data exports and formatting.
    • How much investment will be made in hypothesis testing? This is the highest value

    With a process in place to capture all three types of efforts in a discrete and trackable way enables reporting back out on the value delivered by the organization:

    • Hypothesis testing — reporting is the number of hypotheses tested and the business value delivered from what was learned
    • Performance measurement — reporting is the level of investment; this needs to be done…and it needs to be done efficiently
    • Quick data requests — reporting is output-based: number of requests received, average turnaround time. In a way, this reporting is highlighting that this work is “just pulling data” — accountability for that data delivering business value really falls to the requestors. Of course, you have to gently communicate that or you won’t look like much of a team player, now, will you?
    Cancel
    Show all 10 highlights
    Sort
    ADAPT to Act and Learn

    ADAPT to Act and Learn

    clearhead.me
    Annotary Research Annotary Research
    3 years ago
    Excel version
    Cancel
    Sort
    Ten (Not So Obvious) Digital Optimization Truths

    Ten (Not So Obvious) Digital Optimization Truths

    clearhead.me
    Annotary Research Annotary Research
    3 years ago
    Digital Optimization is not the same thing as A/B testing. Rather, Digital Optimization is the continuous validation of hypotheses in the interest of making more fruitful decisions and actions. Testing is just one tool in the toolbox of a Digital Optimization program.
    Cancel
    Digital Optimization is driven as much (if not more) by the UX, Creative, Engineering, Product Management and Marketing teams as it is by the Analytics team
    Cancel
    A Digital Optimization program requires skills and roles that are not native to most Digital organizations
    Cancel
    A Digital Optimization program is not only about “winning” tests and improved conversion rates. It is equally about realizing when hypotheses are invalid and projects should not be resourced or prioritized as defined.
    Cancel
    Digital Optimization requires us to re-imagine our ideas, assumptions and opinions into hypotheses that connect beliefs to change.
    Cancel
    There is no single “right” way to prioritize hypothesis testing.
    Cancel
    the real “magic” is in the organizational re-alignment around consumer data and validated learnings.
    Cancel
    Digital Optimization programs have two high level, potential benefits: (1) performance impact and (2) validated learnings.
    Cancel
    A Digital Optimization program has the dual consequence of (a) ensuring that the Analyst’s work is aligned with business context and priorities and (b) that the business stakeholders are increasingly learning and posing more informed hypotheses for validation.
    Cancel
    Show all 9 highlights
    Sort

    Multi-armed bandit isn't an algorithm, it's a model of how to view the problem. ... | Hacker News

    news.ycombinator.com
    Annotary Research Annotary Research
    3 years ago
    Multi-armed bandit isn't an algorithm, it's a model of how to view the problem. Like it or not, the problem web designers face fits the multi-armed bandit model pretty well. The algorithm called "MAB" in the article is one of many that have been developed for multi-armed bandit problems. Traditionally, the "MAB" of this article is known as "epsilon-greedy".
    Cancel
    I should say also that multi-armed bandit algorithms also aren't supposed to be run as a temporary "campaign" - they are "set it and forget it". In epsilon-greedy, you never stop exploring, even after the campaign is over. In this way, you don't need to achieve "statistical significance" because you're never taking the risk of choosing one path for all time. In traditional A/B testing, there's always the risk of picking the wrong choice.
    Cancel
    Show all 2 highlights
    Sort
    Lessons learned A/B testing with GAE/Bingo

    Lessons learned A/B testing with GAE/Bingo

    bjk5.com
    Annotary Research Annotary Research
    3 years ago
    esson learned, tool updated: lean on historical graphs, not stat sig
    Cancel

    Lesson learned, tool updated: leave a trail of your past experiments

    Cancel
    As our team has grown, it’s become more and more important to figure out how to easily save and share past test results.
    Cancel
    Lesson still being learned, tool not updated: interpreting results is very hard
    Cancel
    Show all 4 highlights
    Sort
    The Ultimate Guide To A/B Testing | Smashing Magazine

    The Ultimate Guide To A/B Testing | Smashing Magazine

    www.smashingmagazine.com
    Annotary Research Annotary Research
    3 years ago

    Even though every A/B test is unique, certain elements are usually tested:

    • The call to action’s (i.e. the button’s) wording, size, color and placement,
    • Headline or product description,
    • Form’s length and types of fields,
    • Layout and style of website,
    • Product pricing and promotional offers,
    • Images on landing and product pages,
    • Amount of text on the page (short vs. long).

    Cancel
    When doing A/B testing, never ever wait to test the variation until after you’ve tested the control.
    Cancel
    Don’t conclude too early. There is a concept called “statistical confidence” that determines whether your test results are significant (that is, whether you should take the results seriously).
    Cancel
    Don’t surprise regular visitors. If you are testing a core part of your website, include only new visitors in the test. You want to avoid shocking regular visitors, especially because the variations may not ultimately be implemented.
    Cancel
    Don’t let your gut feeling overrule test results
    Cancel
    Know how long to run a test before giving up.
    Cancel
    Show repeat visitors the same variations
    Cancel
    Make your A/B test consistent across the whole website.
    Cancel
    Do many A/B tests.
    Cancel
    Show all 9 highlights
    Sort
    7 A/B Testing Resources for Startups and Solo Developers

    7 A/B Testing Resources for Startups and Solo Developers

    mashable.com
    Annotary Research Annotary Research
    3 years ago
    Finally, make sure that your website is functional and optimized for excellent, fast, cross-browser performance before you commit to testing. After all, no one will care whether the button is green or red if the page takes a minute and a half to load.
    Cancel
    Sort

    Why Multi-armed Bandit algorithms are superior to A/B testing (with Math) | Hacker News

    news.ycombinator.com
    Annotary Research Annotary Research
    3 years ago
    A prerequisite for making any of this work is having a statistically significant number of visits to your site on a daily basis, right?

    I think many people first have to figure out how to cross that bridge before they start to worry about optimizing what's on the other side.

    -----

    Cancel
    Which method is “best” depends on what you know about the problem. Does its optimal solution look a certain way? change over time? and so on. If you’re willing to bet on your answers to those questions, you can choose a method that’s biased toward your answers, and you’ll converge more rapidly on a solution. The risk, however, is that you’ll bet wrong and converge on a poor solution (because your biases rule out better solutions).
    Cancel
    Show all 2 highlights
    Sort
    Multi-Armed Bandit - A more efficient way to do A/B Tests | Growth Giant

    Multi-Armed Bandit - A more efficient way to do A/B Tests | Growth Giant

    www.growthgiant.com
    Annotary Research Annotary Research
    3 years ago
    during the test you would have lost half of your potential conversions. This method is costly and inefficient and poses the risk of losing conversions and large sums of revenue.
    Cancel
    A more efficient way to A/B test is to take a multi-armed bandit approach, just like with the slot machine example above. This helps maximize the conversion rate during the test period, mitigating the risk of losing revenue.
    Cancel
    The multi-armed bandit approach to split testing allows marketers to maximize conversions (by decreasing their cost or regret) during the split testing period. It allows you to continuously split test different variations and theories on an ongoing basis, always ensuring that you’re not wasting valuable traffic on poorly performing web page variations.
    Cancel
    Show all 3 highlights
    Sort
    Why your CRO tests fail | distilled

    Why your CRO tests fail | distilled

    www.distilled.net
    Annotary Research Annotary Research
    3 years ago

    I’d been trying not to overthink the problem (I often get called a stats wonk) and simply to trust the tools at my disposal and follow common practices:

    • I had a user objection I was targeting
    • I had a reason for believing my B might outperform my A
    • I trialled a significant change
    • I ran standard software
    • I sought a 95% confidence level

    It was only after getting some strange results and digging deeper into the statistics that I discovered how dangerous this was.

    Cancel
    Do you pay any attention to the blend of traffic hitting your A/B tests?
    Cancel

    We are seeking a 95% confidence level. What I found was that even if you are:

    1. Correctly setting out to run a test for the recommended length of time
    2. Avoiding peeking at the result part way through
    3. Calling a test successful if it achieves 95% confidence
    Cancel

    As many as one in five of your “successful” results may in fact come from having accidentally (randomly) sent more high-converting (“email”) traffic to one variant or the other. (This explains why people sometimes find “successful” results when they are actually comparing two identical pages).

    The glimpse of light at the end of the tunnel is that the longer you run a test for, the more the channel distribution converges (by the law of large numbers) to be the same for each variant. This means that we can fix the problem for running our tests for longer.

    Cancel
    I’ve seen people recommend running A/A tests to detect setup errors or to set the sample size [PDF]. But since traffic blend can change over time, by running A/A/B/B and only accepting a result when the As and Bs have converged, I think we can avoid running tests forever.
    Cancel
    Next time you set up an A/B test, set up two identical versions of each variant – let’s call them A1 and A2, B1 and B2. (Our hypothesis is that the conversion rate of B is better than the conversion rate of A).
    Cancel
    This methodology removes the need to run exceptionally long tests to be confident in rarely seeing skewed results. Instead, it focuses on discarding tests that appear to have been skewed by uneven traffic mixes leaving only real results we can confidently put live.
    Cancel
    A/A/B/B
    Cancel
    Show all 8 highlights
    Sort
    Google Experiments: Is The Multi Armed Bandit Stealing Your CRO Success?

    Google Experiments: Is The Multi Armed Bandit Stealing Your CRO Success?

    3doordigital.com
    Annotary Research Annotary Research
    3 years ago
    The bandit uses sequential Bayesian updating to learn from each day’s experimental results, which is a different notion of statistical validity than the one used by classical testing. A classical test starts by assuming a null hypothesis. For example, “The variations are all equally effective.” It then accumulates evidence about the hypothesis, and makes a judgement about whether it can be rejected. If you can reject the null hypothesis you’ve found a statistically significant result.
    Cancel
    For tests with a lot more traffic this bandit method may seem sound. In the event that you do not have a lot of traffic and you receive results as seen above, I would recommend taking advice from Critchlow’s article and running multiple variants of the same tests and comparing the results. In some cases this may help avoid the situation which I described above where the runaway “winning page” was implemented and underperformed in real life.
    Cancel
    Show all 2 highlights
    Sort
    4 Ways to A/B Split Test Your Facebook Advertising - Business 2 Community

    4 Ways to A/B Split Test Your Facebook Advertising - Business 2 Community

    www.business2community.com
    Annotary Research Annotary Research
    3 years ago
    Headline & Text:
    Cancel
    Photo: If
    Cancel
    Demographics: While your message may appeal to a broad demographic, certain segments of your a
    Cancel
    Destination Landing Page
    Cancel
    Show all 4 highlights
    Sort
    What are the most unexpected things people have learned from A/B tests?

    What are the most unexpected things people have learned from A/B tests?

    www.quora.com
    Annotary Research Annotary Research
    3 years ago
    If the element you're testing hurts page load speed, you could be missing the entire reason that your results are surprising you. It's not because your assumptions about what will work better were wrong, it's because you tested your users' collective patience and didn't realize it
    Cancel
    1) That both the A and B options are rubbish and that a fundamental rethink is needed. A very sobering and useful experience.

    2) As no site is like yours and no site has the exakt make up of audience that your's does, you should always challenge good practice/best practice.
    Cancel
    hat men and women actually respond quite differently to different copy. F
    Cancel
    UI matters:
    Cancel
    Cluttered v/s neat
    Cancel
    Account for progress:
    Cancel
    Account for sematical reasoning
    Cancel
    Use what is learned in previous experiments
    Cancel
    What works for others may not work for you
    Cancel
    Make sure you track the funnel all the way to your goal.  We've had a number of test that lead more people to the signup page but resulted in significantly less signups.  Generally the learning was that the more you make it clear what they are doing on that page (aka signing up) the more likely they were to signup if they got to that page.  Alternatively you could try to just lead more people to the signup page and do a better job converting/informing there.  As always your mileage may vary.
    Cancel
    We usually find out there are some browser/os bugs we haven't noticed before. For example, we test to see if a new button will perform better but we find out there is a problem on the site's other sections.

    Usually something more surprising is, testing the same functionality in two very similar sites (let's say 2 real estate sites, with very similar design, same data) and seeing one performs fantastic while other just doesn't work.
    Cancel
    I would say the second thing is how different a better option is, say the thing everyone wanted to do, then the best option, which almost never has anything that people want to test and was thrown in just to make others see how "dumb" that idea was.  There seems to be an almost inverse correlation between what people think will win and what really does win.
    Cancel
    was surprised to just learn from an A/B test last week that the winning recipe for a page prompting users to download our browser extension was not a gift card giveaway but a version promoting the time and money saving benefit of our extension.  We had assumed givewaways would lead to the highest Click Thru.
    Cancel
    Unexpected? Hmm...I encountered the ugliest layout turn out to be the winner and the marketer actually did not choose the winning recipe but the 2nd highest to match with overall site layout/UI for consistency.
    Cancel
    Show all 14 highlights
    Sort
    Optimizing the Web Experience: A/B Testing, Multivariate Testing or Both?

    Optimizing the Web Experience: A/B Testing, Multivariate Testing or Both?

    www.cmswire.com
    Annotary Research Annotary Research
    3 years ago
    Back then, of course, testing the efficiency of one element at a time was sufficient. Now, online marketers need to measure multiple variables — all at once and in relation to one another — to coax more sales and conversions from their websites. This can’t be achieved by A/B testing. Hence the emergence of multivariate (MVT) testing.
    Cancel
    If you simply need to know which headline on your landing page will generate more click-throughs, downloads or sign-ups, A/B testing might be the right choice for you. But if you need to know that, as well as which price presentation delivers higher sales, and with what headline the winning price presentation performs better, you’re going to have a lot more success (and save a lot of time) using multivariate testing.
    Cancel
    Marketers who live and die by their online results, however, need a more complex solution and therefore generally rely on multivariate testing. Multivariate testing applies to a multitude of page combinations, opening up many more possibilities for understanding how your site works for visitors, and how to continually refine it.
    Cancel
    MVT Reveals How Your Site Elements Interact
    Cancel
    it’s entirely possible that the optimum solution is in fact your original headline but with a particular CTA and graphic combination. A/B testing can miss that entirely. Or take forever to figure out.
    Cancel
    MVT is Better for Exploring
    Cancel
    MVT Fosters a Test-and-Learn Culture
    Cancel
    Show all 7 highlights
    Sort
    How To Get Started In A/B Testing: 6 Tips For Success - Games Brief

    How To Get Started In A/B Testing: 6 Tips For Success - Games Brief

    www.gamesbrief.com
    Annotary Research Annotary Research
    3 years ago

    if you want to embed a testing culture in your organization, you’re looking for quick wins and a demonstrated ability to get results. That means adopting a ‘lean’ approach, getting quick validation of the concept, rather than building a large, complex structure before a single test has taken place.

    Cancel
    By de-coupling the testing framework from engineering release cycles, we can create short test cycles, often run by product management or even marketing teams. In order to do this, we need to build our game to be as ‘data-driven’ as possible, with a clear understanding and agreement as to what data points are ‘open’ to testing. When we’ve reached that point, setting up a test can become as simple as changing a value in a spreadsheet or testing platform (which is why marketing can do it!)
    Cancel
    In fact, when we’re encouraging ‘failure’ and creating a agile and adaptive data-driven culture, they often do. So make sure to minimize the impact of failure whilst maximizing learnings from negative results.
    Cancel
    So it bears repeating: always design your tests to test one thing and one thing only.
    Cancel
    When testing, you’ll normally define the criteria for success in advance. In fact this step (and recording it) is important. But whilst we should define a conversion event, such as completion of the tutorial or a specific purchase, that is closely linked to the test itself, we should also take the time to examine the longitudinal impacts of the test.
    Cancel
    A word of caution. Always understand what you are looking for, and note in advance any KPIs you fear may be adversely affected. If we look at multiple metrics for multiple tests, it stands to reason that sooner or later we will see what may appear to be significant results. Chances are, however that these lie within normal, expected variation, and on that basis are not meaningful. Limiting ourselves to a specific set of KPIs that we might expect to change minmizes that risk of a ‘false positive’.
    Cancel
    Instead, you should limit these types of tests to new users to the game who have not yet “learned” the UI, and where you get a valid assessment of how effective each of the UI variations is on fresh users. This is known as the “primacy effect” in psychology literature and relates to our natural pre-disposition to more effectively remember the first way we’ve experienced something.  Your testing framework should allow you to restrict the test to new users only.
    Cancel
    Show all 7 highlights
    Sort
    Do A/B Testing Easily

    Do A/B Testing Easily

    www.organizedthemes.com
    Annotary Research Annotary Research
    3 years ago
    For $19 a month, Optimizely will give you up to 2,000 monthly visitors. Now you’re saying I have way more traffic than that so is going to cost a fortune. If you’re running an experiment and someone visits your site, but doesn’t encounter the experiment, they aren’t counted against your total. Optimizely defines a visitor this way:
    Cancel
    Sort

    11 Obvious A/B Tests You Should Try

    www.quicksprout.com
    Annotary Research Annotary Research
    3 years ago
    If you are looking to squeeze more dollars out of your existing traffic, you need to start running A/B tests. If you have at least 10,000 monthly visitors, you should consider running 1 new A/B every other month, if not once a month.
    Cancel
    Test #1: Add the word FREE in your ads
    Cancel
    Test #2: Create an explainer video
    Cancel
    Test #3: Have your signup button scroll with the visitor
    Cancel
    Test #4: Removing forms fields
    Cancel
    Test #5: Create a two-step checkout process
    Cancel
    Test #6: Show a live version of your product instead of using screenshots
    Cancel
    Test #7: Free trial versus money back guarantee
    Cancel
    Test #8: Trial length
    Cancel
    Test #9: Offer time based bonuses
    Cancel
    Test #10: Add a dollar value to your free offers
    Cancel
    Test #11: Button colors
    Cancel
    Test #12: Tell people to come up and talk to yo
    Cancel
    Show all 13 highlights
    Sort
    Multi-armed Bandit Experiments - Analytics Blog

    Multi-armed Bandit Experiments - Analytics Blog

    analytics.blogspot.com
    Annotary Research Annotary Research
    3 years ago
    Twice per day, we take a fresh look at your experiment to see how each of the variations has performed, and we adjust the fraction of traffic that each variation will receive going forward. A variation that appears to be doing well gets more traffic, and a variation that is clearly underperforming gets less.
    Cancel
    The multi-armed bandit’s edge over classical experiments increases as the experiments get more complicated. You probably have more than one idea for how to improve your web page, so you probably have more than one variation that you’d like to test. Let’s assume you have 5 variations plus the original. You’re going to do a calculation where you compare the original to the largest variation, so we need to do some sort of adjustment to account for multiple comparisons.
    Cancel
    Show all 2 highlights
    Sort
    Google’s New Content Experiments API Turns Google Analytics Into A Full-Blown A/B Testing Platform | TechCrunch

    Google’s New Content Experiments API Turns Google Analytics Into A Full-Blown A/B Testing Platform | TechCrunch

    techcrunch.com
    Annotary Research Annotary Research
    3 years ago
    new Content Experiments API, a tool that allows developers to easily test their sites’ content with programmatic optimization. The new API is deeply integrated with Google Analytics, so developers can use all of Analytics’ power to measure their different optimizations
    Cancel
    The service uses a multi-armed bandit approach to A/B testing, which automatically adjusts how often users see a given experiment based on how well every variation performs.
    Cancel
    Show all 2 highlights
    Sort

    StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies

    statprob.com
    Annotary Research Annotary Research
    3 years ago
    ``All we know about the world teaches us that the effects of A and B are always different - in some decimal place - for any A and B. Thus asking `Are the effects different?' is foolish.'' (Tukey, 1991, page 100)
    Cancel
    t is certainly not a good scientific practice, where one is expected to present arguments that support the hypothesis in which one is really interested. The real problem is to obtain estimates of the sizes of the differences.
    Cancel
    Show all 2 highlights
    Sort
    Bandit: An A/B Testing Alternative for Rails

    Bandit: An A/B Testing Alternative for Rails

    findingscience.com
    Annotary Research Annotary Research
    3 years ago
    here are no good answers for what you should do when A performs just as well as B. Was the sample size just too small (implying you should try again with a large sample)? Go with A? Go with B? Does it matter? The reality is it may matter - but you won’t know.
    Cancel
    magine you have a multitude of possible alternatives, and you want to make a decent choice between alternatives you know perform well and alternatives you haven’t tried very often each time a user requests a page. With each page load, pick the best alternative most of the time and an alternative that hasn’t been displayed much some of the time. After each display, monitor the conversions and update what you consider the “better” alternatives to be. This is the basic method of a solution to what is called the multi-armed bandit problem.
    Cancel
    With a bandit solution, there is no concept of a “test”. At no point does the system announce a winner and a loser. Alternatives can be added or removed at any time.
    Cancel
    Go ahead and try something crazy. If it performs poorly, it won’t be shown very often.
    Cancel
    There’s no “test”, and no minimal sample size needed before optimization can start.
    Cancel
    Designers and developers can add alternatives or remove them at any time.
    Cancel
    If one alternative performs the same as another, they will both be displayed with the same regularity. There would be no need to choose one over the other or remove either of them.
    Cancel
    Show all 7 highlights
    Sort
    Why multi-armed bandit algorithm is not "better" than A/B testing - Visual Website Optimizer Blog

    Why multi-armed bandit algorithm is not "better" than A/B testing - Visual Website Optimizer Blog

    visualwebsiteoptimizer.com
    Annotary Research Annotary Research
    3 years ago
    What we found out was that the reality is not as simple as what that blog post claimed to be. In short, multi-armed bandit algorithms do not “beat” A/B testing. In fact, a naive understanding of this algorithm can lead you to wondering what’s happening under the hood.
    Cancel
  • For 10% of the time, you split your traffic equally between the two versions (called the exploration phase)
  • For the rest 90% of the time, you send traffic to currently best performing version (called the exploitation phase)
  • Cancel
    Not so soon! Actually, if you just talk about average conversion rates, multi-armed bandit algorithms usually perform better than A/B testing. But a fundamental point missing here is the concept of statistical significance.
    Cancel
    For finding out if a variation is performing better or worse, we use statistical tests such as a Z-test or a chi-square test, and mathematically (and intuitively) you need to have tested at least a certain number of visitors before you can say with any certainty that the variation is really bad performing (or is current numbers are just due to chance).
    Cancel
    What multi-armed bandit algorithm does is that it aggressively (and greedily) optimizes for currently best performing variation, so the actual worse performing versions end up receiving very little traffic (mostly in the explorative 10% phase). This little traffic means when you try to calculate statistical significance, there’s still a lot of uncertainty whether the variation is “really” worse performing or the current worse performance is due to random chance. So, in a multi-armed bandit algorithm, it takes a lot more traffic to declare statistical significance as compared to simple randomization of A/B testing. (But, of course, in a multi-armed bandit campaign, the average conversion rate is higher).
    Cancel
    here’s a clear tradeoff between average conversion rate and the time it takes to detect statistical significance. Moreover, it is also clear that any advantages of multi-armed bandit algorithms vanish if conversion rate of different versions is similar. The only scenario where multi-armed bandit algorithms would work best is if performance of different versions is massively different (which is infrequent in practice). Even in such cases, since difference is massive simple randomization would detect statistical significance quite early on, so for rest of the time you can simply use the best performing version.
    Cancel

    So, comparing A/B testing and multi-armed bandit algorithms head to head is wrong because they are clearly meant for different purposes. A/B testing is meant for strict experiments where focus is on statistical significance, whereas multi-armed bandit algorithms are meant for continuous optimization where focus is on maintaining higher average conversion rate.

    Cancel
    Show all 7 highlights
    Sort
    The A/B Testing and Multi-Armed Bandit kerfuffle - Michael Leo

    The A/B Testing and Multi-Armed Bandit kerfuffle - Michael Leo

    michaelthinks.typepad.com
    Annotary Research Annotary Research
    3 years ago
    Testing, whether through A/B, multivariate or multi-armed approaches is always* better than not testing/optimizing. Get off your duff and learn enough to make it happen or hire someone if you're not a technical person. 
    Cancel
    Drawing incorrect conclussions from a test is always worse than not testing
    Cancel
    Not all platforms are built the same. I don't care what Visual Website Optimizer or Google Website Optimizer tell you. Most consumer platforms are built to increase the conversion rates of simple funnels. For anything more complex, e.g. long-term impact of major site redesign A vs. B, you are going to have to dig in and build your own solution to properly cohort your web visitors and gather the non-conversion metrics that will help you judge success over time (e.g. Time on Site, Page Views, engagement rate of X, etc. – whatever it is that makes your business move)
    Cancel
    You don't know your users as well as you think. It's a hard lesson but the sooner you learn it, the sooner you'll start making changes that will truly move the needle for your business. Unburden yourself from the fallacy that you are like the majority of your users.
    Cancel
    Show all 4 highlights
    Loading...