Joining and Merging DataFrames with Pandas

Equally important part of pandas to enable us to merge tables with relevant data together.

Ian Ng

6/14/20253 min read

body of water beside rocks
body of water beside rocks

We Use Merge In Pandas, How We Used Joins in MySQL

For this part let me demonstrate how I use Merge and joining functions to join relevant data from different dataframes together.

First things first, as always, import the pandas library into Jupyter:

now, we need an additional dataset to perform joins, so here is another one:

Now, the first case, I will be performing a simple Merge without add-ons:

Now, once I have loaded my df1 and df2, I know that my df1 is the left table while my df2 is the right one. By using a simple Merge to join df2, I am simply utilizing the Inner Join function, where only data that matches on both tables are output.

I can test to see if this is the exact case, by placing a 'how' attribute inside this function:

The 'Inner' indicates inner join. The result of this is the exact output of that simple merge.

When I am joining on a specific column, the other columns not included, pandas will automatically rename those other columns that are identical yet not being joined:

Note that 'Skills' and 'Age' are from different tables, and they were all included in this join. FirstName_x is the column from the left dataframe and FirstName_y is the column from the right dataframe.

If we were to inner join based on all identical columns on both dataframes, using a list, we would get the exact same output as a normal inner join:

Moving on to outer joins. To perform a full outer join, we do this:

We can see from here, the function is joining everything from both tables, but for those data that the function couldn't find any values for, it will replace them as NaN, not a number.

Moving over to left and right joins. Now for example, if I wanted to join matching rows on the right and ALL values from the left dataframe, I would use the left join, and vice versa:

Right join:

Now, this is rarely used, but here is cross join. It joins everything from each table, and by 1 row in one table and every other row in the other table. Quite a stress on computing resources:

Finally we have the concatenate function. Concatenate stacks up the dataframes on each other by default, and joining everything vertically:

We could also see the indexing resets at 0 on the other table, so technically it's just stacking up the dataframes.

We could also do concatenation horizontally by specifying the axis:

And that is it from me on merging using Pandas.