Data Analysis with Pandas Part 2

More advanced pandas functions that also replicates what we do in MySQL

Ian Ng

5/26/20252 min read

gray concrete building during daytime
gray concrete building during daytime

Grouping and Aggregating with Pandas

In this article, I will be using pandas to handle some grouping functions, which is pretty essential if you're working on a database holding millions of data.

Group By

Now, similar to the group by function in MySQL, this groupby function in pandas will also group by labels defined in the dataframe.

Now, let's say I'd like to group by coffee. But take note, simply grouping by the labels will not output data directly, we'll need to pair it with an aggregate function:

Let's group by 'coffee_name' and get the average:

Now this is pretty redundant, as buyers would already now what each type of coffee costs, when they check out the vending machine menu. But this is how we do groupby and output it.

Giving these data a little more meaning by summing the sales up instead, for each type of coffee:

Note that 'money' is amount of cash paid for each purchase.

We could also check how much sales for each type of coffee we made to date, within this csv file with a count function:

Seems like Americano with Milk, Latte, Americano, Cappuccino is the all-time Top 4 favorites for consumers in this area.

Now if we need to see what percentage of customers pay by cash or card for each coffee, we could do multiple group bys:

This data shows different totals on cash and card purchases for each type of coffee. Pretty interesting.

Now, if i want to display a few aggregate function output in a dataframe, i can also use the agg function:

Output shows total cash paid for each payment method, for each type of coffee. All displayed with a single line of code.

And if you want a pretty generalized statistical summary of your dataframe on this coffee_name, there is the describe function:

There is also the min() and max() functions used in pandas, but for this data where all the prices are the same for each type of coffee, it won't matter what the min and max is.