1

I couldn't find an answer in my specific data frame case. Would like to use a Cartesian product (Cross Join) on a large dataset in Python. I found many related posts like: Performant cartesian product (CROSS JOIN) with pandas but non of these can be easily applied by me because I have indexes and I can't easily slice my data set to single columns and then merge.

enter image description here

My data: where years(2021-2022) and days(1D,2D,3D) are indexes.

My goal: is a cartesian product of these with creating "new" indexes which I can't grab easily at the moment. The new indexes are: Years, Days and Names.

Solutions like: data3 = d1.merge(d2, how="cross") didn't work as the year index was removed and too many columns were created without assigning days as a column.

Michael W
  • 45
  • 5
  • It helps if you also add your example in code, so we can run it. It looks like you're trying to: `df.melt(".")` - assuming `.` is the name of the column holding the years. – jqurious Mar 30 '23 at 22:42
  • What are `d1` and `d2`. I see only one input dataframe. – Corralien Mar 31 '23 at 01:58

1 Answers1

1

According your image, you can do:

out = (df.rename_axis(index='Dates', columns='Days').stack()
         .rename('Names').reset_index())
print(out)

# Output
   Dates Days   Names
0   2021   1D     Bob
1   2021   2D   Alice
2   2021   3D     Tom
3   2022   1D   Georg
4   2022   2D  Elvira
5   2022   3D     Zoe

Minimal reproducible example:

data = {'1D': {2021: 'Bob', 2022: 'Georg'},
        '2D': {2021: 'Alice', 2022: 'Elvira'},
        '3D': {2021: 'Tom', 2022: 'Zoe'}}
df = pd.DataFrame(data)
print(df)

# Output
         1D      2D   3D
2021    Bob   Alice  Tom
2022  Georg  Elvira  Zoe
Corralien
  • 109,409
  • 8
  • 28
  • 52