The Wisdom of Trivia Crowds

In the fall of 1906, people across Britain traveled to Plymouth for its annual West England Fat Stock and Poultry Exhibition, a country fair focused on the appraisal of farm animals. In attendance was the statistician Francis Galton, a cousin of Charles Darwin as well as the coiner of the phrase “nature versus nurture”. As the 85-year-old strolled past the stalls, he became interested in a competition to guess the weight of a particular ox. Nearly 800 visitors had wagered a guess on its size, and prizes would be awarded to the best estimators.

Galton was astonished when the middlemost guess ended up being a mere 11 pounds less than the true weight of 1,197. (Later analysis actually showed that the mean estimate was exactly 1,197 pounds, but maybe Galton was biased in favor of reporting the middlemost - he did, after all, also coin the term “median”.) His observation inspired interest into a phenomenon that later became known as the wisdom of the crowd. Today you may see it at play in prediction markets or in a contest at your local library to guess the number of gumballs in a jar. Another place to observe its effect? Water Cooler Trivia tiebreaker questions.

What’s the data we’re dealing with?

Each Water Cooler Trivia quiz ends with a tiebreaker question, which we don’t expect anyone to know the precise answer to. For example:

“The average human head has about 100,000 hair follicles. About how many individual hairs will each follicle grow in a person's lifetime?”

Come up with a guess. Unless you’re a savant who pores over dermatology textbooks, you’ve probably not memorized such a particular fact. In the case of multiple people getting the same score on the rest of the quiz, the winner is determined by who submits the closest guess on the tiebreaker. Here are some quick stats on our tiebreakers:

215 unique questions across 7 trivia categories
307,859 total responses
57.8% of responses are overestimates
38.6% of responses are underestimates
3.6% of responses are exactly right!

Let’s take a deeper look at the collective wisdom of the tens of thousands of WCT participants. P.S. No, we do not use Price Is Right scoring; we consider overestimates and underestimates equally.

Which metric best approximates the answer?

As we saw with Galton’s ox (okay, technically it wasn’t his ox...), different definitions of average will approximate better than others. We looked at four different measures of central tendency:

Mean = how we use “average” in everyday speech. Sum up everyone’s guesses and divide by the total number of guesses.
Median = the “middlemost” guess. Rank the guesses in descending order, and choose the one in the middle of the list.
Mode = the most common guess.
Geometric mean = multiply all guesses together and take the Nth root of the product where N = the number of guesses. Yeah, this one’s the most complex, and often used with values that are exponential in nature.

As a snappy review, imagine we’re guessing how many family members were actually in The Jackson 5, and the guesses were {2,3,3,4,5,6,7}.

Mean = 4.29
Median = 4
Mode = 3
Geometric mean = 3.95

Okay, so back to those 200ish tiebreakers. For each tiebreaker question we ranked these metrics in terms of which was closest to the true answer. Here’s how they compared:

How to interpret: on 46 of the questions, Mean was the best measure of centrality.

Being (a) mean is the worst

First off, the loser of the bunch was the mean. In addition to the worst average ranking, it absolutely dominated last place position (in the 211 tiebreaker questions, it was the worst of the four measures 51% of the time). A few extreme outlier responses reliably yanked the mean way above the true value. This really shouldn’t come as a surprise - the floor for underestimates is zero, while for overestimating, the sky’s the limit. And some Water Cooler Trivia participants ventured wayyyyy up into the stratosphere with their guesses.

Sometimes this was clearly due to them holding down the ‘9’ key in resignation. But for certain questions, it can be really tough just predicting the magnitude of the correct answer, let alone the exact value. Often it was hard to tell the difference between an “I give up” answer and an earnest, but truly dreadful attempt. In some cases, we specifically ask the participants to “answer in thousands”, and inevitably, some will ignore our request.

One respondent even explicitly refused to answer in imperial units, as requested for some questions. While their steadfast allegiance to the metric system may be admirable, it was uhh, less than desirable from a data cleaning perspective.

We decided to ignore all responses more than 850 times the actual answer to control for this somewhat, but there was definitely some unavoidable classification error here. These extreme guesses amounted to just over 1% of our total responses. After removing these and other abberances (non-numeric responses, etc.), we were left with 298,774 total guesses across 211 questions.

Medians rule them all

The median just barely edged out mode in terms of average ranking to claim the title of best metric. But you’ll notice that mode cleaned up when it came to the first place position. Again, not surprising. This simply represents the chunk of participants who actually knew the exact answer which we were expecting them to have to guess. While this is impressive, it does sort of go against the spirit of “wisdom of the crowd”.

Geometric mean was mainly included to spice up the analysis a bit. Since it performed so dismally, feel free to forget we ever explained it to you.

Conclusion: Median wins.

How does crowd-wisdom compare across categories?

Now that we’ve determined median to be the best metric for measuring trivia crowd wisdom, let’s break it down by category.

Context on the chart you’re about to see: we’re looking at the mean of the median errors for each question within each category here. Meta-averages!

The match-up between these three brains in a battle of wits would be quite the spectacle.

Normally Water Cooler Trivia features 9 different categories. You’ll notice that two of those, Word Play and Current Events, don’t make an appearance in any of our tiebreakers. Of the 7 remaining categories, the only one in which participants have a tendency to overestimate the correct value is Fine Arts.

This deserves a bit of an asterisk. Here’s a count of the number of questions we looked at in each category.

Looks like we need more Fine Arts tiebreaker questions

With only 5 Fine Arts tiebreaker questions, our result here is probably spurious. We’re inclined to say that most trivia respondents underestimate tiebreaker answers across any category. This is in stark contrast to the mean, where basically every question is massively overestimated on because of outliers, as mentioned before.

Somewhat surprising is the difficulty our participants have in answering questions in the Personalized category. After all, this category is just what it sounds like. When a group signs up for Water Cooler Trivia, they can opt for a category with questions catered to their organization. These are usually related to their industry or geographic location, but it can be whatever you request! The mean of the median response error for Personalized questions is a stunning 21.1% below the true answer - worse than any other category. Now that’s what you’d call a home-court disadvantage! *ba-dum-tss*

A final look at every question

Overestimates end up dominating because you can be more than 100% wrong in that direction.

Here is a fun figure to send you home with: the median error for every single Water Cooler Trivia tiebreaker. Some final observations/facts:

For the majority of questions (62.6%), most respondents underestimate the true answer
The median error of all individual responses was -7.7%. Impressive!
The mean error of all individual responses was 254.9%. Pitiful!

Now you might be wondering about that huge spike towering over the others. Remember the hair follicle question?

The answer is 20.

Our median guess was 150. Only 19.6% of participants guessed low.

How close were you?

The Wisdom of Trivia Crowds

What’s the data we’re dealing with?

Which metric best approximates the answer?

Being (a) mean is the worst

Medians rule them all

How does crowd-wisdom compare across categories?

A final look at every question

About the author

PRODUCT

COMPANY

SUPPORT