Python: Simplest way to join two DataFrames by unique combinations?

Question

I have two DataFrames:

fuels = pd.DataFrame({'Fuel_Type':['Gasoline', 'Diesel', 'E85']})

years = pd.DataFrame()
years['Year_Model'] = range(2012, 2041)

My desired output is a single new DataFrame which combines these two dataframes as two columns, but for each value in 'years', have it repeated for every unique fuel type in 'fuels'.

In other words, there should be three repetitions for each distinct year, one for each type of fuel.

I can do this very simply in R with:

df <- merge(
         data.frame(years = c(2012:2040)),
         data.frame(fuels = c("Gasoline", "Diesel", "E85")),
         allow.cartesian = T)

I have looked at answers for similar questions such as:

Create all possible combinations of multiple columns in a Pandas DataFrame

Performant cartesian product (CROSS JOIN) with pandas

cartesian product in pandas

But, either I cannot seem to apply the answers' code to my own data, or the answers are too complex for me to understand (as I am very new to Python).

Is there a nice and 'easy to understand' way of doing this?

score 2 · Accepted Answer · answered Sep 24 '20 at 04:21

The second link you posted has a good solution, but it also has a lot of other stuff, so it might be hard to extract if you're new to python. You want:

df = fuels.assign(key=0).merge(years.assign(key=0), on = 'key').drop('key', 1)

This is kind of a slick one liner, because we're doing a few things at once. We're essentially adding a column of 0s to each dataframe, joining on that, and then getting rid of that column. Here is it broken down into steps:

fuels = fuels.assign(key=0) #add a 'key' column to fuels with all 0s for values
years = years.addign(key=0) #add a 'key' column to years with all 0s for values
df = fuels.merge(years, on = 'key') #sql-style join on the key column
df = df.drop('key', 1) #get rid of the key column in the final product

The merge method defaults to an inner join, so we don't need to specify since that's fine. We just have to tell it to join on the right column with on = 'key'. The 1 in the .drop('key', 1) is telling it to drop the column called key (the 1 axis), if we didn't specify (.drop('key')), or gave it a 0 (.drop('key', 0)), it would try to drop a row called key.

@Jacob has really explained it well. You could check my answer to see how to code it. — CK__, Sep 24 '20 at 04:30

score 1 · Answer 2 · answered Sep 24 '20 at 04:27

The below answer should help you:

import pandas as pd


fuels = pd.DataFrame({'Fuel_Type': ['Gasoline', 'Diesel', 'E85']})

years = pd.DataFrame()
years['Year_Model'] = range(2012, 2041)
fuels['key'] = 1
years['key'] = 1


print(pd.merge(fuels, years, on='key').drop("key", 1))

Output:

   Fuel_Type Year_Model
0   Gasoline       2012
1   Gasoline       2013
2   Gasoline       2014
3   Gasoline       2015
4   Gasoline       2016
..       ...        ...
82       E85       2036
83       E85       2037
84       E85       2038
85       E85       2039
86       E85       2040

[87 rows x 2 columns]

Python: Simplest way to join two DataFrames by unique combinations?

2 Answers2