Chapter 2 Data
2.1 Source
Unfortunately, many large soccer data sets are privately owned by the companies that collect them. Acquiring the data for research purposes is difficult and expensive, but for the Soccer Data Challenge [@sobig_soccer_challenge] initiative organized by SoBigData (SoBigData), Wyscout.com made an extensive collection of soccer match logs that cover seven prominent male soccer competitions publicly available.
The data is collected and provided by Wyscout, a leading company in the soccer industry which connects soccer professionals worldwide and lends analytical support to more than 50 soccer associations and over 1,000 professional clubs. The procedure of data collection is performed by video analysts who are trained and focused on data collection for soccer through proprietary software. The data is made publicly available in order to allow for scientific research outside of the private sector.
2.2 Variables & Information Collected
The data set contains all the spatio-temporal events (passes, shots, fouls, etc.) as well as information about the outcome of the event, players and characteristics that occur during each match over seven major soccer competitions. Of the seven competitions, five are European leagues composed of the English Premier League, French Ligue 1, German Bundesliga, Spanish La Liga, and Italian Serie A. These competitions are the most important club competitions in Europe according to the UEFA country coefficient, which is used to rank the football associations of Europe. These rankings are used to determine the number of clubs from an association that will participate in the UEFA Champions League and the UEFA Europa League. In addition to the club data which was gathered during the 2017/18 season, international match data from the 2016 UEFA European Football Championship and 2018 FIFA World Cup is also included [@pappalardo_massucco_2019].