Football Dataset - With More Complicated SQL Analysis and Automation of The Beautiful Game

The previous dataset also seemed to have limited data columns to work with than what we could extract from a normal retail environment, so we are changing our dataset, now focusing on the beautiful game.

Ian Ng

4/25/20254 min read

white and gray Adidas soccerball on lawn grass
white and gray Adidas soccerball on lawn grass

The Data

Perhaps the beautiful game of football isn't just that random where the teams let fate decide who wins, we'll see it through data.

Let's inspect the data before I import this from Kaggle into the MySQL Workbench.

Match events have been reduced from 400000+ rows to 369 rows, since we do not need events in other matches to analyze 1 football match.

Quite rich in information with great details of events that happen during European football matches:

id_odsp: the id of each match

id_event: the occurence of crucial event during the match

sort_order: the order of the match event that occurred

time: the timing of the event when it occurred from the 1st to the 90th minute

text: detailed description of the event

event_type: the main event that happened, from 1 - 11

event_type2: the consequence worth of note resulting from the main event, not necessarily applicable for all main events

side: which side the player involved is on, 1 for home side(Dortmund), 2 for away side(Hamburg)

event_team: the team of that player involved in the event

opponent: always the opposite of the event_team, since only 2 teams are involved in a match

player_in and player_out: substitution of players, either due to strategy, fouling or injuries

shot_place: the condition of the shot placement

shot_outcome: the outcome of the attempted shot

is_goal: did the shot score a goal, 0 is no, 1 is yes

location: the location the event happened at, on the field

bodypart: which bodypart was involved in these shots, either right foot(1), left foot(2), head(3)

assist_method: how was the assist done when the player attempted shot at goal

situation: the resulting event starts from what kind of play on the field, free kick, set piece, etc..

fast_break: was the event a fast counter-attack situation? 0 for no, 1 for yes

Data seems fine without the need for clean and transform.

The teams:

Borussia Dortmund vs Hamburg SV

First of all, without looking at the exact score, we need to know what is the final outcome of this match:

SELECT is_goal, side, event_team

FROM football_match

WHERE id_odsp = 'UFot0hit/' AND is_goal <> 0

;

Seems that Dortmund has won this game 3 against Hamburg 1. Strong as per usual.

Now, how many fouls has each team made a.k.a how intense was the battle before full time in Bundesliga? I'll use a stored procedure to find out, so that I can do it for other teams next time. This dataset comes with a notepad attachment indicating what number represents what event.

3: foul, 4: yellow card, 5: second yellow card, 6: red card

For this part, I am going to count only 3 as the fouls, as fouls are more accurate than yellow or red cards, this is due to some tackles made on the pitch, don't earn any cards, just warnings from the referee, and fouls also contribute to yellow or red cards, so i would be counting double if i considered the cards.

DELIMITER $$

CREATE PROCEDURE num_of_fouls()

BEGIN

SELECT event_team, text, COUNT(event_type) OVER (PARTITION BY event_team) AS num_foul

FROM football_match

WHERE id_odsp = 'UFot0hit/' AND event_type = 3 ;

END $$

DELIMITER ;

CALL num_of_fouls();

It's pretty intense. 12 from Dortmund and 17 from Hamburg:

Now, which team is the most effective at attempting goals?

SELECT event_team, text, shot_place, shot_outcome, COUNT(shot_outcome) OVER(PARTITION BY event_team) AS num_of_attempts

FROM football_match

WHERE id_odsp = 'UFot0hit/' and shot_outcome IN (1,3,4)

;

Dortmund emerged the better of the 2, with a slight advantage in shot attempts of 9 shots to 6.

Now, which team is better at creating situational chances, for example, free kicks, set pieces, etc.?

1: open play, 2: set piece 3: corner 4: free kick

With the exclusion of open play, we will consider the rest as situational chances to score goals.

SELECT event_team, text, situation, time

FROM football_match

WHERE id_odsp = 'UFot0hit/' and situation IN (2,3,4)

ORDER BY time;

Seems like we hit gold here. Goals scored by Dortmund and Hamburg each came from corner kicks, which is also the only one Hamburg scored!

Which team tried hard to grab an advantage in the first half?

Here, we will try to get shots attempted which are on target and situational chances created to see which team tried their best in the first 45 minutes to win the game.

SELECT time, event_team, text, is_goal, shot_place

FROM football_match

WHERE id_odsp = 'UFot0hit/' AND (is_goal = 1 OR shot_place <> 'NA') AND time <= 45;

Well, it seems that Dortmund was the dominant party that tried to close the game in the first half, with 11 to 3 attempts:

Now what about the 2nd half?

SELECT time, event_team, text, is_goal, shot_place

FROM football_match

WHERE id_odsp = 'UFot0hit/' AND (is_goal = 1 OR shot_place <> 'NA') AND time > 45 AND time <= 90;

This time, both Hamburg and Dortmund did the hard work, equally matched attempts at 5 to 5:

Well now, we see that both teams fought hard towards the end of the match, and Dortmund, with their determination are better positioned to win this match against Hamburg.