0

I have the below data frame and am looking to pivot to the below desired format. I've been playing with the parameters and haven't been able to achieve my expected results due to duplicate values and the below error being thrown:

"ValueError: Index contains duplicate entries, cannot reshape"

import pandas as pd

dummy_data = {
'Field_Names': ['Field_1','Field_2','Field_3','Field_1','Field_2','Field_3'],
'Field_Values': ['Bob','STL','1234','Eric','DSM','5678'],
'Record_ID': ['abc123','abc123','abc123','abc123','abc123','abc123']
}
dummy_df = pd.DataFrame(dummy_data)
display(dummy_df)

Desired Output:

details = {'Field_1': ['Bob', 'Eric'],
           'Field_2': ['STL', 'DSM'],
           'Field_3': ['1234', '5678'],
           'Field_ID': ['abc123','abc123']}

desired_output = pd.DataFrame(details)

display(desired_output)

I have tried variations of the below as well as .pivot_table() after reading SO threads with no luck and am unsure what to try next as there are a myriad of solutions out there and I'm not sure which is best for my use case.

final_df = dummy_df.groupby('Field_Names')['Field_Values','Record_Id'].apply(list)
final_df = pd.DataFrame({i:final_df.loc[i] for i in final_df.index.to_list()})

The commented solution also does not work and throws our duplicate error:

dummy_df.pivot(index='Record_ID', columns='Field_Names', values='Field_Values').reset_index()
  • `dummy_df.pivot(index='Record_ID', columns='Field_Names', values='Field_Values').reset_index()` – mozway Dec 01 '22 at 14:13
  • @mozway this does not yield the exact desired output. I do not want the Record_ID field to be set as the index. – Carson Whitley Dec 01 '22 at 14:18
  • It is not, I used `reset_index` – mozway Dec 01 '22 at 14:21
  • Should the Record_ID have duplicates, you will receive a ValueError "ValueError: Index contains duplicate entries, cannot reshape". – Carson Whitley Dec 01 '22 at 14:23
  • Please read the link of the duplicate in detail, it answer most of these questions, if you still have questions afterwards you can edit your post – mozway Dec 01 '22 at 14:25
  • I don't know which method of the many within the SO linked is going to help in my use case. If you kindly can point me in the right direction, I'm happy to test myself. – Carson Whitley Dec 01 '22 at 14:28
  • @mozway the linked SO reference to a pivot on data with duplicates doesn't have an answer that is marked as correct. – Carson Whitley Dec 01 '22 at 14:55
  • You need to deduplicate, see #10 – mozway Dec 01 '22 at 14:57
  • @mozway it still doesn't appear to fully align as my dataframe has one more column than referenced on question 10. After manipulating, I can do the below: dummy_df.insert(0, 'count',dummy_df.groupby('Field_Names').cumcount()) dummy_final = dummy_df.pivot(index='count', columns=['Field_Names'], values=['Field_Values','Record_ID']) But the formatting comes out repeated. I'm unsure how to add my "Record_ID" column onto the dataframe. – Carson Whitley Dec 01 '22 at 15:09
  • @mozway if I follow that answer as is, the result looks good with the exception of it leaving the Record_ID field off completely. dummy_df.insert(0, 'count',dummy_df.groupby('Field_Names').cumcount()) dummy_final = dummy_df.pivot(index='count', columns=['Field_Names'], values=['Field_Values']) – Carson Whitley Dec 01 '22 at 15:12
  • you have to consider it as a secondary index – mozway Dec 01 '22 at 15:17
  • @mozway got it, thank you. To clarify, there isn't a way to produce the desired output listed in the original question without utilizing multi-Index and then breaking out the index to structure the dataframe to it's final output? – Carson Whitley Dec 01 '22 at 15:21

0 Answers0