i am trying to merge two dataframes with an unequal number of rows and columns.
df1 looks like this:
code, item, size, 2020, 2021, 2022, 2023, 2024
1, i1, 50, 303, 343, 432, 855, 431
1, i2, 50, 763, 555, 988, 499, 691
1, i3, 50, 111, 935, 711, 621, 951
2, i1, 50, 998, 222, 533, 629, 853
2, i2, 50, 345, 345, 222, 908, 127
2, i3, 50, 777, 112, 419, 492, 290
and df2:
code, item, size, 2025, 2026
1, i1, 50, 303, 343
1, i4, 50, 763, 555
2, i3, 50, 435, 111
2, i5, 50, 777, 112
3, i1, 50, 653, 199
My goal is to have an output like:
code item size 2020 2021 2022 2023 2024 2025 2026
1 i1 50 303 343 432 855 431 303 343
1 i2 50 763 555 988 499 691 0 0
1 i3 50 111 935 711 621 951 0 0
1 i4 50 0 0 0 0 0 763 555
2 i1 50 998 222 533 629 853 0 0
2 i2 50 345 345 222 908 127 0 0
2 i3 50 777 112 419 492 290 435 111
2 i5 50 0 0 0 0 0 777 112
3 i1 50 0 0 0 0 0 653 199
Where the additional columns in df2 are added to df1, and the values in the "type" column contain all of the values from the "type" column of both df1 and df2. In areas where the "item" values exists in both df1 and df2, the value should be just added to row with the corresponding "item" value and added under the correct year column (like where df1 contains an i1 item and so does df2). When df2 has an "item" value that does not exist for the corresponding "code" value in df1, it should be added with the rest of the "code" values, like df2 containing an i4 "item" with a "code" of 1, an "item" which does not exist in df1.
Any help would be massively appreciated.