Rating Inflation: It’s Not Real
If you’ve spent much time on IMDB, Netflix, or any other site that lists ratings for media, you’ve noticed that, whatever scale they’re using, the average is never in the middle. It’s skewed noticeably to the right. Mediocre media often gets a 6 or 7 out of 10; even the worst trash rarely falls below a 2 or 3. Rotten Tomatoes, in an apparent attempt to compensate, sets their Fresh threshold at 60%.
Many people conclude that we’re suffering from rating inflation: We’re giving bad things okay reviews and okay things good reviews and everything ends up clustered at the top, thus making ratings meaningless. For instance, from Tevis Thompson’s masterpiece of obfuscatory writing:
The review scale is one of the most embarrassing aspects of the videogame community. Where else is an 8 the acceptable level at which to criticize a failure as colossal as BioShock Infinite? The score that won’t cause too many waves, since anything in the 7’s is average at best, and below that: no man’s land. Where else do you see these numbers? School, that’s where. There is perhaps no clearer admission that videogames have not escaped their adolescence than grading them on a high school curve.
This is an old problem, but one that even relatively new sites show no inclination to address. When Polygon launched last year and began putting out higher caliber feature stories, I had some hope that they might approach reviews differently as well. I read their review policy and saw a lot of fuss about updating reviews over time but nothing new when it came to the scale. Worse, the scale they put forward actually validated and reinforced our current low standards, only gussied up with professional language. 9’s “may not innovate or be overly ambitious but are masterfully executed.” 7’s are good but “have some big ‘buts’”. A 5 “indicates a bland, underwhelming game that’s functional but little else.” Not 5 as average, as commonplace, the middle instead of the bottom of the scale. (Their 2’s, 3’s, & 4’s list some silly trinity of ‘complete’ failures to justify their existence.)
The concept of rating inflation suggests that reviews were lower on average in the past, which I don’t think anyone has ever provided evidence for, but let’s assume that the inflation we’re talking about is compared to a theoretical “correct” rating, rather than a past rating. Even by this metric, there’s no real evidence that rating inflation exists. To see why, first we need to separate consumer ratings from critic ratings and look at the differences between them.
I’m defining consumer ratings as ratings given by general users on sites where the primary focus is media consumption. This includes sites like Netflix and Goodreads. Most of these sites allow you to write reviews, but user reviews aren’t the main focus and are usually less important to other users than the aggregate rating. These sites often have extremely high average ratings, with hardly anything dipping below a 3 out of 5. But what does the rating actually mean?
Nextflix spells it out for us: Hated it, didn’t like it, liked it, really liked it, loved it. These user ratings aren’t meant to be complex analyses; they’re only meant to describe how you felt about the movie. If you liked it, you should give it a three. If most people who watched it liked it, then it will have an average score of three. That’s a “true” rating.
So why does practically everything have a three or above, even things that were really bad? It has nothing to do with inaccurate ratings; instead, it’s due to inaccurate sampling. Movies are rated by people who have seen them. People usually see movies that interest them. If Transformers is rated 4/5, that doesn’t mean that the general population really liked it; it just means that people who watch Michael Bay movies really liked it. So when user ratings skew high, all that means is “People mostly like things they think they are going to like.” A truism, to be sure, but truisms are true. All attempts to correct the skew are less true.
I’ll define critic ratings as ratings given by professional movie critics, leaving out hobby critics like myself for the moment. Critic ratings don’t suffer from the sampling problem, because they review a wide variety of media, not just what they expect to like. However, ratings still tend to end up on the higher side of the spectrum–not so much with movies, but definitely with video games. Does this mean that video-game reviewers are a bunch of softies who don’t want to hurt anyone’s feelings? Not at all. It depends on one’s philosophy of what the rating means.
Rating inflation generally implies that ratings ought to form a vague bell curve: If you’re rating from 1 to 5, 10% of films should get a 5, 10% should get a 1, and 40% should get a 3. But, as you know if you’ve ever taken a curved class, bell curves are subjective: No matter how well you did, you fail if everyone else did better than you. That’s not a very useful way to grade a class, because your grade doesn’t actually reflect whether or not you know the material, and it’s not a useful way to rate media, either.
Instead, media can and should be rated based on theoretically-objective criteria. Evaluate the story, characters, themes, writing, cinematography, gameplay, whichever criteria are applicable to that medium, and then give the work an overall rating based on all of those. At no point in the process do you need to compare the work to other works. Are the characters believable, relatable, and three-dimensional? Then the work is well-done in that respect, regardless of how well-written characters in other works may be.
Thus, while these ratings might still fall into a bell curve, they don’t need to. Maybe most stories have well-written characters. Maybe most stories have poorly-written characters. Either way, it would be misleading to present them as mostly in the middle if they aren’t.
So why do the ratings end up mostly at the high end of the spectrum? It makes sense if you think about it. Theatrically released movies, books published by large presses, primetime television, and video games from major studios all have a great deal of work put into them by competent professionals, so it stands to reason that most big-name media achieves a basic degree of competence. Complaints about bad things getting okay reviews often forget just how bad it’s possible to be. Boom mikes, spelling errors, game-breaking bugs: These are all major mistakes and they’re all usually absent from important media. So reviews skew high in acknowledgment of how much worse things could be, and they should skew high so that exceptionally bad works that do make those basic mistakes can correctly be rated lower than those that don’t.
Video game ratings show the most skew because video games are the medium that combines the most elements and, therefore, they have the most axes along which they can fail. The worst possible game would have to fail along every possible axis. Most games don’t–they usually succeed in at least one respect and therefore deserve higher ratings. Some games have terrible graphics but fascinating stories, bad writing but great gameplay, or clumsy interfaces but gorgeous visuals, and they all might deservedly fall in the 60%-75% range, despite their flaws.
The asymmetrical rating scheme isn’t just an alternative to the bell-curve scheme: It’s better. It portrays each individual work as good, okay, or bad based on its own merits, so the rating can help a reader decide if zie will like something. The bell-curve scheme forces most media into the okay category even if it’s really better or worse, so while it may appear less skewed, it’s not useful to actual readers.
It’s time to put to rest the idea of rating inflation. Ratings should reflect how good a work actually is, and that includes the possibility that most of it actually is pretty good.
Comic from XKCD.