Football Dataset - With More Complicated SQL Analysis and Automation of The Beautiful Game
The previous dataset also seemed to have limited data columns to work with than what we could extract from a normal retail environment, so we are changing our dataset, now focusing on the beautiful game.
Ian Ng
4/25/20254 min read
The Data
Perhaps the beautiful game of football isn't just that random where the teams let fate decide who wins, we'll see it through data.
Let's inspect the data before I import this from Kaggle into the MySQL Workbench.
Match events have been reduced from 400000+ rows to 369 rows, since we do not need events in other matches to analyze 1 football match.
Quite rich in information with great details of events that happen during European football matches:
id_odsp: the id of each match
id_event: the occurence of crucial event during the match
sort_order: the order of the match event that occurred
time: the timing of the event when it occurred from the 1st to the 90th minute
text: detailed description of the event
event_type: the main event that happened, from 1 - 11
event_type2: the consequence worth of note resulting from the main event, not necessarily applicable for all main events
side: which side the player involved is on, 1 for home side(Dortmund), 2 for away side(Hamburg)
event_team: the team of that player involved in the event
opponent: always the opposite of the event_team, since only 2 teams are involved in a match
player_in and player_out: substitution of players, either due to strategy, fouling or injuries
shot_place: the condition of the shot placement
shot_outcome: the outcome of the attempted shot
is_goal: did the shot score a goal, 0 is no, 1 is yes
location: the location the event happened at, on the field
bodypart: which bodypart was involved in these shots, either right foot(1), left foot(2), head(3)
assist_method: how was the assist done when the player attempted shot at goal
situation: the resulting event starts from what kind of play on the field, free kick, set piece, etc..
fast_break: was the event a fast counter-attack situation? 0 for no, 1 for yes
Data seems fine without the need for clean and transform.
The teams:
Borussia Dortmund vs Hamburg SV
First of all, without looking at the exact score, we need to know what is the final outcome of this match:
SELECT is_goal, side, event_team
FROM football_match
WHERE id_odsp = 'UFot0hit/' AND is_goal <> 0
;
Seems that Dortmund has won this game 3 against Hamburg 1. Strong as per usual.
Now, how many fouls has each team made a.k.a how intense was the battle before full time in Bundesliga? I'll use a stored procedure to find out, so that I can do it for other teams next time. This dataset comes with a notepad attachment indicating what number represents what event.
3: foul, 4: yellow card, 5: second yellow card, 6: red card
For this part, I am going to count only 3 as the fouls, as fouls are more accurate than yellow or red cards, this is due to some tackles made on the pitch, don't earn any cards, just warnings from the referee, and fouls also contribute to yellow or red cards, so i would be counting double if i considered the cards.
DELIMITER $$
CREATE PROCEDURE num_of_fouls()
BEGIN
SELECT event_team, text, COUNT(event_type) OVER (PARTITION BY event_team) AS num_foul
FROM football_match
WHERE id_odsp = 'UFot0hit/' AND event_type = 3 ;
END $$
DELIMITER ;
CALL num_of_fouls();
It's pretty intense. 12 from Dortmund and 17 from Hamburg:
Now, which team is the most effective at attempting goals?
SELECT event_team, text, shot_place, shot_outcome, COUNT(shot_outcome) OVER(PARTITION BY event_team) AS num_of_attempts
FROM football_match
WHERE id_odsp = 'UFot0hit/' and shot_outcome IN (1,3,4)
;
Dortmund emerged the better of the 2, with a slight advantage in shot attempts of 9 shots to 6.
Now, which team is better at creating situational chances, for example, free kicks, set pieces, etc.?
1: open play, 2: set piece 3: corner 4: free kick
With the exclusion of open play, we will consider the rest as situational chances to score goals.
SELECT event_team, text, situation, time
FROM football_match
WHERE id_odsp = 'UFot0hit/' and situation IN (2,3,4)
ORDER BY time;
Seems like we hit gold here. Goals scored by Dortmund and Hamburg each came from corner kicks, which is also the only one Hamburg scored!
Which team tried hard to grab an advantage in the first half?
Here, we will try to get shots attempted which are on target and situational chances created to see which team tried their best in the first 45 minutes to win the game.
SELECT time, event_team, text, is_goal, shot_place
FROM football_match
WHERE id_odsp = 'UFot0hit/' AND (is_goal = 1 OR shot_place <> 'NA') AND time <= 45;
Well, it seems that Dortmund was the dominant party that tried to close the game in the first half, with 11 to 3 attempts:
Now what about the 2nd half?
SELECT time, event_team, text, is_goal, shot_place
FROM football_match
WHERE id_odsp = 'UFot0hit/' AND (is_goal = 1 OR shot_place <> 'NA') AND time > 45 AND time <= 90;
This time, both Hamburg and Dortmund did the hard work, equally matched attempts at 5 to 5:
Well now, we see that both teams fought hard towards the end of the match, and Dortmund, with their determination are better positioned to win this match against Hamburg.











