1

i am trying to merge two dataframes with an unequal number of rows and columns.

df1 looks like this:

code, item, size, 2020, 2021, 2022, 2023, 2024
1, i1, 50, 303, 343, 432, 855, 431
1, i2, 50, 763, 555, 988, 499, 691
1, i3, 50, 111, 935, 711, 621, 951
2, i1, 50, 998, 222, 533, 629, 853
2, i2, 50, 345, 345, 222, 908, 127
2, i3, 50, 777, 112, 419, 492, 290

and df2:

code, item, size, 2025, 2026
1, i1, 50, 303, 343
1, i4, 50, 763, 555
2, i3, 50, 435, 111
2, i5, 50, 777, 112
3, i1, 50, 653, 199

My goal is to have an output like:

code item size 2020  2021    2022    2023    2024    2025    2026
1    i1  50  303     343     432     855     431     303     343
1    i2  50  763     555     988     499     691     0       0
1    i3  50  111     935     711     621     951     0       0
1    i4  50  0       0       0       0       0       763     555
2    i1  50  998     222     533     629     853     0       0
2    i2  50  345     345     222     908     127     0       0
2    i3  50  777     112     419     492     290     435     111
2    i5  50  0       0       0       0       0       777     112
3    i1  50  0       0       0       0       0       653     199

Where the additional columns in df2 are added to df1, and the values in the "type" column contain all of the values from the "type" column of both df1 and df2. In areas where the "item" values exists in both df1 and df2, the value should be just added to row with the corresponding "item" value and added under the correct year column (like where df1 contains an i1 item and so does df2). When df2 has an "item" value that does not exist for the corresponding "code" value in df1, it should be added with the rest of the "code" values, like df2 containing an i4 "item" with a "code" of 1, an "item" which does not exist in df1.

Any help would be massively appreciated.

ldhwaddell
  • 11
  • 1
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Naveed Oct 20 '22 at 23:35

1 Answers1

1

You are looking for an outer merge

df1 = pd.DataFrame({'code': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2}, ' item': {0: ' i1', 1: ' i2', 2: ' i3', 3: ' i1', 4: ' i2', 5: ' i3'}, ' size': {0: 50, 1: 50, 2: 50, 3: 50, 4: 50, 5: 50}, ' 2020': {0: 303, 1: 763, 2: 111, 3: 998, 4: 345, 5: 777}, ' 2021': {0: 343, 1: 555, 2: 935, 3: 222, 4: 345, 5: 112}, ' 2022': {0: 432, 1: 988, 2: 711, 3: 533, 4: 222, 5: 419}, ' 2023': {0: 855, 1: 499, 2: 621, 3: 629, 4: 908, 5: 492}, ' 2024': {0: 431, 1: 691, 2: 951, 3: 853, 4: 127, 5: 290}})
df2 = pd.DataFrame({'code': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3}, ' item': {0: ' i1', 1: ' i4', 2: ' i3', 3: ' i5', 4: ' i1'}, ' size': {0: 50, 1: 50, 2: 50, 3: 50, 4: 50}, ' 2025': {0: 303, 1: 763, 2: 435, 3: 777, 4: 653}, ' 2026': {0: 343, 1: 555, 2: 111, 3: 112, 4: 199}})

df1.merge(df2, "outer").sort_values("code")
bitflip
  • 3,436
  • 1
  • 3
  • 22