0

I need to add new columns to a dataframe by transposing rows of another dataframe.

Here is my DF1:

ID1  Value1  Value2
11   xxx      yyy
12   aaa      bbb

Here is DF2:

Index ID2   KEY     VALUE
1     123   ID1     11
2     123   FOLDER  folder1
3     123   FILE    file1
4     456   ID1     12
5     456   FOLDER  folder2
6     456   FILE    file2
7     456   FOLDER  folder3
8     456   FILE    file3

Note that ID2 456 has duplicate KEY's.

I want to ignore all duplicate keys and pick the values for last KEY's (with highest value of Index).

What I want:

  1. Look for the KEY = ID1 in DF2.
  2. Transpose all the remaining keys with identical ID2

Output:

ID1  Value1  Value2  FOLDER   FILE
11   xxx      yyy    folder1  file1
12   aaa      bbb    folder3  file3
ShitalSavekar
  • 379
  • 2
  • 4
  • 10

1 Answers1

2

You can first do a pivot, then a merge:

new_df = df1.merge(df2.drop_duplicates(keep='last', subset=['ID2', 'KEY']).pivot(index='ID2', columns='KEY', values='VALUE').astype({'ID2': int}))

Output:

>>> new_df
   ID1 Value1 Value2   FILE   FOLDER
0   11    xxx    yyy  file1  folder1
1   12    aaa    bbb  file3  folder3
  • Thank you @Richardec for the answer. But, I just noticed that my DF2 has duplicate KEY's. Meaning, I have 2 values of FOLDERNAME for ID2 = 456 in above example. Due to this, I'm getting `ValueError: Index contains duplicate entries, cannot reshape` error. How can I take last value of the KEY? – ShitalSavekar Mar 22 '22 at 17:28
  • 1
    Will you please provide a sample of such a case in the question? Just so that it's easier for me to understand. –  Mar 22 '22 at 17:31
  • @Shital done. Check now :) –  Mar 22 '22 at 17:40
  • The duplicate key error is gone but now I'm getting this error - `KeyError: "Only a column name can be used for the key in a dtype mappings argument. 'ID1' not found in columns."` – ShitalSavekar Mar 22 '22 at 17:58
  • Weird. Try now. –  Mar 22 '22 at 18:00
  • 1
    Sorry, My bad. There was problem in my dataframe. Your solution works perfectly. Accepting as right answer. – ShitalSavekar Mar 22 '22 at 18:30