The dataset I chose for my project was called “Pokémon for Data Mining and Machine Learning” which I found on Kaggle. I originally picked this dataset because it contained many variables, both categorical and quantitative, that I could attempt to correlate in a number of ways. In the end, however, I decided that I would set out to compare the numerous Pokémon types. Pokémon embody one to two “types” which, in part, determine that Pokémon’s various stats, the moves it can use, and its appearance. There are currently eighteen types of Pokémon: Bug, Dark, Dragon, Electric, Fairy, Fighting, Fire, Flying, Ghost, Grass, Ground, Ice, Normal, Poison, Psychic, Rock, Steel, and Water.
In most Pokémon games, the player can only carry 6 Pokémon with him or her at one time, therefore it is always difficult to decide which Pokemon to bring along and which to leave behind; often times this decision is influenced by the Pokémon’s type. It is always a good idea to maintain a diverse team of Pokémon and this analysis of the Pokémon types is designed to help players with just that.
In the Pokémon universe there are a number of Legendary Pokémon that are significantly rarer and more powerful than the average Pokémon. In my data set of 721 Pokémon, 46 of them are legendary.
In order to deduce which type of Pokémon yields the most legendaries, I created a bar chart comparing the number of legendary Pokémon that belong to each type. Note that the total number of Pokémon shown in this bar graph is 73, not 46 because I am looking at both primary and secondary types. If a legendary is of the types Psychic and Flying, for example, it will be counted once for the Psychic type and once for the Flying type; effectively meaning that all Pokémon with two types are double counted. I chose to do this because I set out to compare what Pokémon you would have available should you commit yourself to a particular type. By including secondary types I am changing the overall counts of legendaries but when it comes to the individual types, I am fairly and completely representing all of the pokemon that have that type, even those who only have it secondarily.
From the data we can see that the Psychic type is the most legendary type; followed closely by Dragon, Flying, and Fire. Curiously there are no legendary Pokémon of either the Bug or Poison type. The three Pokémon that have the Fighting type possess it only as a secondary type.
If you are seeking prestige of legendary proportions, then the Psychic, Dragon, Flying, and Fire types may be for you. If, on the other hand, you are a fan of the Bug or Poison types, you will have to settle for a team without legendary Pokémon.
If you couldn’t care less about fame and legendary Pokémon and instead want only the strongest for your team, then you will want to know which Pokémon can deal the most damage, take the most hits, and respond the quickest. A Pokémon’s combat abilities can be summarized by six numbers: HP, Attack, Special Attack (Sp_Atk), Defense, Special Defense (Sp_Def), and Speed. A Pokémon’s overall combat rating is calculated by summing each of its six other stats. I plotted the overall combat rating for all 721 Pokémon in a histogram and showed the type distributions using different colors. This histogram can be used to see the overall distribution of the Total stat before dividing the Pokémon by type.
This data is very clearly multimodal and appears to have modes roughly every 100 steps on the x axis. This could potentially show that pokemon are designed to loosely fit into different classes of strength depending on their Total stat. Also, for the curious among you, the strongest Pokémon by total stat is Arceus and the weakest is Sunkern. Here is a summary table showing numerical information about the spread and the center of the Total stat.
Minimum | Q1 | Median | Q3 | Maximum | Mean | SD |
---|---|---|---|---|---|---|
180 | 320 | 424 | 499 | 720 | 418 | 110 |
As stated before, the histogram, regardless of the colors, was mainly intended to show the overall distribution of the Total stat, not the individual type distributions. For a clearer picture of the Total stat for each of the types individually, we can turn to a box-and-whisker plot.