0

I have two DataFrames, one containing the main data of interest, and the other a lookup containing columns I wish to append to the former. For example:

df1

    Name   Date_1       Date_2
0   John   2019-11-13   2019-12-28
1   Amy    2019-11-13   2019-12-28
2   Sarah  2019-11-14   2019-12-29
3   Dennis 2019-11-14   2019-12-29
4   Austin 2019-11-15   2019-12-30
5   Jenn   2019-11-08   2019-12-23
df2

    Var_1  Var_2
0   1x2    test_1
1   3x4    test_2   
2   5x6    test_3

For each row in df2, I would like to append Var_1 and Var_2 to df1 and populate all with the values from that row. This would be repeated for each unique row in df2, resulting in a concatenated dataframe like so:

df_final

    Name   Date_1       Date_2      Var_1  Var_2
0   John   2019-11-13   2019-12-28  1x2    test_1
1   Amy    2019-11-13   2019-12-28  1x2    test_1
2   Sarah  2019-11-14   2019-12-29  1x2    test_1
3   Dennis 2019-11-14   2019-12-29  1x2    test_1
4   Austin 2019-11-15   2019-12-30  1x2    test_1
5   Jenn   2019-11-08   2019-12-23  1x2    test_1
6   John   2019-11-13   2019-12-28  3x4    test_2
7   Amy    2019-11-13   2019-12-28  3x4    test_2
8   Sarah  2019-11-14   2019-12-29  3x4    test_2
9   Dennis 2019-11-14   2019-12-29  3x4    test_2
10  Austin 2019-11-15   2019-12-30  3x4    test_2
11  Jenn   2019-11-08   2019-12-23  3x4    test_2
12  John   2019-11-13   2019-12-28  5x6    test_3
13  Amy    2019-11-13   2019-12-28  5x6    test_3
14  Sarah  2019-11-14   2019-12-29  5x6    test_3
15  Dennis 2019-11-14   2019-12-29  5x6    test_3
16  Austin 2019-11-15   2019-12-30  5x6    test_3
17  Jenn   2019-11-08   2019-12-23  5x6    test_3

My initial solution is to iterate over each row in df2, appending columns Var_1 and Var_2 to df1 with the values of that row. I would then concatenate the resulting dataframes to create df_final.

While this solution works, the dataframes will eventually become much larger so I feel like a more efficient solution does exist.

Le Chase
  • 170
  • 9

1 Answers1

1

I would change your solution slightly.

  • Instead of iterating over the rows of df1 (larger dataframe), I would iterate over the rows of df2 (smaller dataframe.
  • And rather than thinking of the rows of df2, I would zip the individual columns in df2, and iterate the values in the columns.

It might be interesting to try both approaches and time the differences.

import pandas as pd

# step 1: create dataframe 1
df_1 = pd.DataFrame({
    'Name': ['John', 'Amy', 'Sarah'],
    'Date_1': ['2019-11-13', '2019-11-13', '2019-11-13'],
    'Date_2': ['2019-12-28', '2019-12-28', '2019-12-28', ]
})

print('df_1: ')
print(df_1)
print()

# step 2: create dataframe 2
df_2 = pd.DataFrame({
    'Var_1': ['1x2', '3x4', '5x6'],
    'Var_2': ['test_1', 'test_2', 'test_3']
})

print('df_2: ')
print(df_2)
print()

# step 3: create empty master dataframe to store results
df_new = pd.DataFrame()

# loop through the columns in df_2
for each_var1, each_var2 in zip(df_2['Var_1'], df_2['Var_2']):

    # create a copy of df_1
    temp_df = df_1.copy()

    # add 2 new columns to the dataframe with Var_1 and Var_2
    temp_df['Var_1'] = each_var1
    temp_df['Var_2'] = each_var2

    # concatenate the temp dataframe to master
    df_new = pd.concat([df_new, temp_df])

print('new master dataframe: ')
print(df_new)
print()
  • Interesting. This does work, however I found the solution here using `merge` to be much more succinct (https://stackoverflow.com/questions/13269890/cartesian-product-in-pandas). Turns out the name of what I was trying to accomplish was "cartesian product", which would have been helpful to know before I asked this question! – Le Chase Oct 24 '19 at 17:59