Introduction

The score of an anime on MyAnimeList (MAL) is computed as a weighted average of the score given by the users as described by the formula in this page. The global rankings resulting from said calculation are displayed at this page.

Every scoring/ranking system has its pros and cons; in the case of MAL rankings, one con is that users do not score anime using the same pattern: the numerical vote has a different meaning for each of them.
Some users use the full scale from 1 to 10, some consider 6-7 a low score and assign most scores in the 6-10 range, some may give either 10 (liked) or 1 (disliked) with no in-between, and so on.
Furthermore, not every viewer has watched and rated all the anime.

This small project was created out of curiosity of seeing what happens when using a different method that ignores discrepancy in scoring patterns, and is only based on the relative scores assigned by each individual user.
For example, if a user scored A = 10, B = 10, C = 7, D = 3, the only information the model retains is A = B > C > D, ignoring the absolute scores.

Methodology

We choose to use the Bradley-Terry model, and use a random sample of 50,027 MAL users that scored at least 5 completed or dropped anime. Given two anime, the model predicts the probability that a random users will prefer one over the other, based on the data we collected.
An intuitive way of explaining how we construct the model table is that we treat this as if it was a sports tournament with all the anime as competitors. If two anime appear in one user's list, they 'play' against each other by comparing how the user scored them: a win is declared (1) if one is scored strictly higher than the other, (2) otherwise, if one is completed and the other is dropped. All other results are considered a tie and do not contribute to the results.
The table is then used to approximate the parameters of the model.

Disclaimer: All the data used to computed the results was gathered from MyAnimeList via their official API between 14th and 18th January 2023. The code used for the process is available on GitHub.

Results

One evident problem is that some anime are viewed by an extremely tiny number of users, sometimes in the single digits in our sample. Since such anime do not have many points of comparison, two issues surface: (1) the convergence speed of the parameter is much slower; (2) such outliers are not placed at an appropriate rank - if only three users watch anime A and rate it higher than the #1 anime on MAL, do you think it is fair to say that A should be at the top of the ranks?
To limit this issue, we have multiple datasets: for each, we removed the anime that have fewer than a given number of comparisons, shown next to the slider. Anime with 0 comparisons are automatically excluded.

Note: This is a limited selection of filters that were applied before the computation of the parameters.
Exclude anime with at most the following number of comparisons:

Some anime with extremely low viewership still slip through the cracks: the next filter excludes the anime that appear in a limited number of users' lists, shown next to the slider.

Note: This filter is applied after the computation of the parameter.
Exclude anime that appear in at most the following number of lists:

Reading the table

# of comparisons counts the total number of matches 'played' by the anime, i.e. how many times it was compared to some other anime.
Sample popularity counts how many users had the anime in their completed or dropped list, it includes the value as a percentage of the sample size.
Parameter is the parameter assigned to the anime in the probability model.
What does it mean? If we take two anime A and B with parameter p_A and p_B, respectively, the probability that a user rates A higher than B is P(A>B) = p_A / (p_A + p_B).

Note: The table may take a few seconds to load when opening the page or changing filters, especially on slower devices or connections.