Chapter 1 Welcome

1.1 Introduction

Soccer is truly the world’s game. With an estimated 4 billion fans worldwide, professional soccer is more popular than baseball, American football, basketball, and hockey combined. In the U.S, many sports fans consider the Super Bowl as the biggest and most important game of the year. After all, it does attract the most attention of any American sporting event, with an average viewership of just under 100 million. However, the NFL championship game pales in comparison to Europe’s biggest yearly soccer event, the Champions League Final, which quadruples the amount of fans tuning in at 400 million. Additionally, according to FIFA, the 2018 World Cup final between France and Croatia reached a total viewer count of 1.1 billion. Across the 64 games of the competition 4.5 billion viewers tuned in [@burton_2022]. Proving that while soccer might not be the most popular sport in the U.S, the sport dominates the global market.

However, statistical analysis of professional soccer has been way behind that of the “Big 4” American sports (baseball, football, basketball, and hockey). Michael Lewis brought data analysis in professional sports into the mainstream when he published Moneyball: The Art of Winning an Unfair Game in 2003. The book detailed the Oakland Athletics’ and its general manager Billy Beane’s analytical and evidence-based approach to assembling a competitive baseball team despite Oakland’s small budget. Since then, data analytics has taken the sports world by storm. Teams and fans alike have become more accustomed to advanced analytics, and have massive amounts of data available at a moment’s notice. All major American sports teams have integrated statistical analysis into their daily operations.

However, professional soccer teams and coaches have been slower to accept analytics. This is due to the fact that until recently soccer has been one of the most difficult sports to analyze from a statistical perspective. Fundamentally, the biggest difficulty with analyzing soccer is the nature of the game. The low scoring and fluid style of the sport lends itself to a lack of data describing the performance of players and teams [@pappalardo_2021].

Soccer analytics has been an area of interest for a long time, but without the technology to track where and when events are happening few breakthroughs have happened. That is until recently, individual clubs have begun to hire specialized companies to collect massive amounts of data about players’ every move. Video-tracking and GPS technology has introduced an enormous amount of data to the soccer world. The volume and complexity of the data provides an unprecedented opportunity for clubs to measure their performance throughout individual games.

1.2 Research Questions & Goals

As mentioned previously, one of things that makes soccer more difficult to analyze and make predictions about outcomes is the fact that goals only account for a tiny fraction of the events that happen in match (if they occur at all). Within this data set, that average match contains 1675 events (SD: 110), and only a couple will be goals given the low-scoring nature of the game. However, the most common event in a match are passes, accounting for 50% of all events.

Passes are a fundamental way to move the ball around the field and create scoring chances and opportunities. The way a team passes is often influenced by many factors like their tactics, time of the game, player’s, and more. Our interest lies in understanding the differences between how and where teams pass. Our main research questions are:

  1. How are passes distributed differently based on factors within a soccer match?

  2. Where on the field do teams find more success passing than other teams, specifically in the Premier League?

  3. How does question #2 vary between the top 5 European leagues?