Exploring Famous Soccer Players on Wikipedia

by Adam Duan

 

An image of Famous Soccer Players.

My general topic for this final project is famous soccer players. I have been a big fan of soccer since I was kid and played for school teams in middle school and high school. In fact, I might have spent more time on the filed than in classroom. Soccer have always been an important part in my life, I learned a lot things that I cannot acquire from the textbooks and made some good friends. Because of injury, I am not able to play in college but I am still following soccer games closely and always motivated by the inspiring stories of great soccer stars, such as those of Cristiano Ronaldo and Arjen Robben. Their stories always interest me and motivate me to look up to them, to put in work.

This final project inspires me to do a learning project about famous soccer players because after a semester’s learning of this course, we have been taught many ways to collect data from public websites. Moreover, we have also acquired the knowledge to create clusters, grouping all of the data we have grabbed from the Internet and put it into different groups. With the ability to create clusters, we are able to analyze the correlations among the web pages. I found this topic really interesting because by constructing a website as what we have done for Project Four, I am able to form a chronological web page for my selected soccer players. From this chronological page we can observe his growth and experiences from his debut to his current career stage, how exciting! This is also another very interesting feature I have on the website, the statistics for his “Top Words”, which is a set of 10 words that appeared the most throughout his careers. It is really fun to analyze these words, we can even come up with a basic summary of a player’s career by interpreting his “Top Words”. Take my favorite player Cristiano Ronaldo as an example, his “Top Words” consist of “madeira”, “Portugal”, “rivalry”, “bust” and so on. Ronaldo was born in the island of madeira, and after he made the world know his name, he has been doing charities on his hometown and advertising his hometown to the world using his influence. Thus this word “madeira”shows up in his page. The word “rivalry” appears for that throughout his career, he has been compared with Lionel Messi, people enjoyed watching their duel. If someone mentions Ronaldo, it is impossible to not mention Messi, so it is not surprise this word shows up. As for the word “bust”, it is an even more interesting story. To honor his contributions for his hometown, the mayor of Madeira made him a statue. However, the statue was too ugly and did not look like Ronaldo at all, it went viral on the internal and had become one of the hottest memes. This is how we connect theses words and interpret them.

Moreover, another interesting feature on the website is the column of other players that they are in close associations with. Taking Lionel Messi as an example, in his column we can find “Diego Maradona”, “Ronaldinho” , and not surprisingly, “Cristiano Ronaldo ”. The logic behind it is that, Messi have been regarded as the best player after Diego Maradona, so the media always compare Messi and Diego Maradona, debating his chance of surpassing Diego. “Ronaldinho” appears pretty often because they were teammates when Messi was at his young age back in Barcelona, and Messi took over the club from Ronaldinho’s hands.

For this project, all of my data comes from Wikipedia. Initially I selected these pages based on my personal preferences, all of the 50 famous players are the ones I respect and appreciate. After I selected all of the pages, I deliberately planned the ordering of the pages. Generally I place the current generation of players in the front part of my data, and put those who have retired in the latter part. Besides, I places these players who are playing or have played in a same team or league together. These are the two methods of how I order my data sets.

After I input my data and run the program, three interesting clusters are generated. The clusters are grouped based on a very simple logic. My smallest cluster only consist of 3 players, they are grouped together because they are all Chinese players. They did not play for top leagues overseas. My second cluster consists of 8 players, the similarity they have shared is that all of the players have played for Italian league, Serie A , and they have now all retired. My last cluster is significantly larger than the other two, it consists of 39 players, they played for different leagues and are from different nations, from my observation, they are grouped together simply because they have neither played for Italian league or Chinese league. Because of the limited number of clusters, they are grouped into one cluster.

After analyzing clusters, I took a closer look at the individual pages for various players. From my observations, I find three interesting patterns / conclusions. Firstly, each year after the player’s debut, the total amount characters, sections, links and images increases. Secondly, the first paragraphs from different years’ pages changed a lot. In the very first year, the first paragraph usually talks about where the player is from, his current team, and some introduction of his childhood. For the latter pages, the paragraphs include all of the player’s achievements. Such as national league titles and Champion League titles, and numerous personal prizes. Generally the first paragraph is getting longer because more and more honors were added into this paragraph. The last conclusion is the most interesting, even though every year the page will have a different “Top Words” column, and the words vary lot. Some words that appear in the column may disappear in the next year. But the top five words does not change that often. You can find these words’ appearances in most of his year pages.

Let’s take Lionel Messi for an example, he has his first Wikipedia page on 2005, with only 10019 characters and 45 links. Now his page contains 1308110 characters and 1886 links. His first paragraph is “Lionel Messi started playing football at a very early age in his hometown's Newell's Old Boys . From the age of 11, he suffered from a hormone deficiency and as Lionel's parents were unable to pay for the treatment in Argentina, they decided to move to Barcelona , Spain . Shortly after arriving in Spain, Lionel tried his luck with a trial at FC Barcelona , despite being only 13 years of age. He excelled at the trial and rapidly found himself starting for the Barcelona B team, averaging more than a goal per game.” The first page paragraph of 2018 however is “Lionel Messi is an Argentine professional footballer who plays as a forward and captains both Spanish club Barcelona and the Argentina national team . Often considered the best player in the world and regarded by many as one of the greatest players of all time , [6] [7] [8] Messi has won a record-tying five Ballon d'Or awards , [note 2] four of which he won consecutively, and a record five European Golden Shoes . He has spent his entire professional career with Barcelona, where he has won 33 trophies, including nine La Liga titles, four UEFA Champions League titles, and six Copas del Rey . Both a prolific goalscorer and a creative playmaker, Messi holds the records for most official goals scored in La Liga (392), a La Liga season (50), a club football season in Europe (73), a calendar year (91), El Clásico (26), as well as those for most assists in La Liga (153) and the Copa América (11). He has scored over 650 senior career goals for club and country.”

Comparing these two paragraph we can clearly see that the latest page contains and the astonishing title he has won, and has removed the story of his child hood. As for the “Top Words” feature, three words appears on every year’s page except the page of 2005. There are “Argentina”, “Barcelona” and “Maradona”. These are the things that stayed with Lionel Messi throughout his career.


An image of Famous Soccer Players.