Python data analysis example: Analyzing World Cup match statistics to find winning patterns

Soccer is one of the most popular sports in the world, especially when it comes to the FIFA World Cupgenerates a lot of data, and analyzing it can be very helpful in understanding a team's performance and winning patterns. In this post Use World Cup match data to compare team performance and win patterns with this Python data analysis example.This course will take you from data loading to exploratory data analysis (EDA), visualization, and performance comparison.

The importance of analyzing World Cup match data

World Cup match data contains a wealth of information, including individual player performance, team strategy, and whether or not a team won. By analyzing this data, you can determine which teams have stronger offenses and defenses, and what the winning patterns are for certain teams. In this data analysis example, we'll use World Cup match data to compare team performance and identify winning patterns.

Loading and exploring data with Python data analysis examples

First, create a KaggleLet's load and explore the FIFA World Cup match data provided by the FIFA World Cup, which contains information such as the team name, score, and year of the game for each match.

Loading data from Python

import pandas as pd

Load # World Cup match data
url = "https://raw.githubusercontent.com/martj42/international_results/master/results.csv"
wc_data = pd.read_csv(url)

Explore the # data
print(wc_data.head())
print(wc_data.describe())
파이썬 데이터분석 예제 - 데이터 탐색 그림

The code above allows you to load FIFA World Cup match data. This data contains results from international soccer matches played since 1872, with information such as team name, score, and year played for each match.

Data structure descriptions
  • date: Match Date
  • home_team: Home team name
  • away_team: Away team name
  • home_score: Home Team Score
  • away_score: Away Team Goals
  • tournament: Competition name (FIFA World Cup, friendly match, etc.)
  • city: Host City
  • country: Host country
  • neutral: Neutral match or not

This data provides useful information to analyze and compare the performance of each team.

Data preprocessing

To analyze only World Cup matches, we'll filter for FIFA World Cup matches and perform data preprocessing.

Filtering FIFA World Cup matches
Filtering only # FIFA World Cup matches
wc_fifa = wc_data[wc_data['tournament'] == 'FIFA World Cup']

Check for # missing values
print(wc_fifa.isnull().sum())

# Select only the columns you need
wc_fifa = wc_fifa[['date', 'home_team', 'away_team', 'home_score', 'away_score', 'city', 'country', 'neutral']]

# Convert date column to datetime format
wc_fifa['date'] = pd.to_datetime(wc_fifa['date'])

print(wc_fifa.head())

The code above filters only FIFA World Cup matches to extract the data needed for analysis. This filtered data can be used to analyze team performance and scoring patterns in World Cup matches.

Analyze match results

Now we'll analyze each team's match results to see which team has more wins and what their winning patterns are. To calculate the match results, we'll add a win or loss for each match.

Add a winning team

Calculate the # Winning Team
def determine_winner(row):
    if row['home_score'] > row['away_score']:
        return row['home_team']
    elif row['home_score'] < row['away_score']:
        return row['away_team']
    else:
        return 'Draw'

wc_fifa['winner'] = wc_fifa.apply(determine_winner, axis=1)

# Check the number of winning teams
print(wc_fifa['winner'].value_counts())

The code above adds a new column that calculates which team won each match. Based on this information, we can determine which team has won more matches. We can also see the percentage of matches that ended in a draw.

Visualize your data

Now let's visualize the number of wins for each team to see which team has more wins in the World Cup.

Visualize the number of wins by team

import matplotlib.pyplot as plt

Visualize the number of wins for the top 10 teams in #
top_teams = wc_fifa['winner'].value_counts().head(10)
plt.figure(figsize=(10, 6))
top_teams.plot(kind='bar', color='skyblue')
plt.title('Number of wins for the top 10 teams in the FIFA World Cup')
plt.xlabel('Team name')
plt.ylabel('Number of wins')
plt.show()

If you run the code above, you can see the number of wins for the top 10 teams with the most wins in the FIFA World Cup. In this Python data analysis example, you can visually see that the big winners are Brazil, Germany, and others. (If you're having trouble reading this, you can use the Troubleshooting Python Hangul Breaks: Fixing Hangul Text Issues in Visualizations post for more information).

파이썬 데이터분석 예제 - 승리횟수 시각화

Compare scoring between teams

Let's analyze the scoring by team to see which teams scored the most goals and which teams have the strongest offense.

Calculate average points per team
# Calculate total goals by summing home team score and away team score
wc_fifa['total_goals'] = wc_fifa['home_score'] + wc_fifa['away_score']

# Calculate the average goals scored by each team
avg_goals = wc_fifa.groupby('home_team')['total_goals'].mean().sort_values(ascending=False).head(10)

Visualize the average goals scored for the top 10 teams in #
plt.figure(figsize=(10, 6))
avg_goals.plot(kind='bar', color='orange')
plt.title('Average goals scored by the top 10 teams in the FIFA World Cup')
plt.xlabel('Team name')
plt.ylabel('Average goals scored')
plt.show()

This code calculates the average goal scored by the top 10 most prolific teams in the FIFA World Cup, and visualizes it to help you understand which teams are the most aggressive. (Something about the data seems off, though - Noskoria is in the top 10!)

파이썬 데이터분석 예제 - 승리횟수 시각화

Analyze winning patterns

Now let's analyze the winning patterns of a particular team, for example, to see if a powerhouse like Brazil has had a strong performance in a particular year.

Analyze a specific team's wins by year
Calculate the number of wins for # Brazil by year
brazil_wins = wc_fifa[wc_fifa['winner'] == 'Brazil'].groupby(wc_fifa['date'].dt.year).size()

# Visualize Brazil's number of wins by year
plt.figure(figsize=(10, 6))
brazil_wins.plot(kind='bar', color='green')
plt.title('Brazil's number of FIFA World Cup wins by year')
plt.xlabel('Year')
plt.ylabel('Number of wins')
plt.show()

This code allows you to create a specific team, for example Brazil's winning patterns by year. This allows you to see when Brazil has had particularly strong performances.

파이썬 데이터분석 예제 - 연도별 브라질 승리횟수 시각화

Common mistakes in data analysis and how to fix them

In this Python data analysis example, we'll look at common mistakes and how to fix them.

  1. Data filtering errorsWhen filtering for FIFA World Cup matches, you need to be careful not to include other competitions or friendly matches. It's important to check and filter for the correct competition name.
  2. Sloppy date handling: When dealing with date data, use the datetime format before you analyze it, or you may encounter errors in your year-by-year analysis.
  3. Errors in win conditions: When determining the winning team, tied scores should be treated as a tie. Be careful not to confuse a winning team with a tied match.

FAQs

Q1: Where can I download FIFA World Cup data?
A: You can export FIFA World Cup match data to GitHubfor download. This dataset contains the full results of international soccer matches.

Q2: How can I analyze winning patterns more accurately?
A: To further refine your win pattern analysis, you can consider additional variables. For example, you can analyze winning patterns based on a team's goal differential or the location of the game (home, away, or neutral).

Q3: Can this method be used to analyze other sports besides soccer?
A: Yes, this is possible. This method is also very useful for analyzing data from other sports events, and can be applied to a variety of sports data analysis, including comparing performance between teams, analyzing players, and more.

Finalize

In this post, you learned how to analyze team performance and find winning patterns using FIFA World Cup match data as an example of Python data analysis. We analyzed the number of wins and goal scoring patterns for each team, and visualized the winning performance of a particular team over the years. We hope you got a sense of the overall flow of data analysis, from preprocessing the data, to visualizing it, to analyzing winning patterns.

Expand your analysis with a variety of sports datasets to uncover interesting insights!

Similar Posts