This is a very basic question but I am stuck. I have a dataframe that looks something like this:
key file
0 1234 abc.pdf
1 1235 ghi.pdf
2 1234 def.pdf
3 1235 jkl.pdf
4 1235 lmn.pdf
There are a variable number of documents associated with each key. I would like to transform this to something like this:
key doc_1 doc_2 doc_3
0 1234 abc.pdf def.pdf NaN
1 1235 ghi.pdf jkl.pdf lmn.pdf
If I try to use df.pivot, I get a new column named for each document name, which is not what I want. I'm at a loss to figure out the pandas-appropriate way to do this. I have reviewed Reshaping and pivot tables — pandas 1.1.0 documentation but haven't found an answer probably because I don't yet 'get' Pandas.
I can do it in raw Python, generating a dictionary that can be fed to pandas, but I'm positive this isn't the right way to go.
import csv
data = []
temp_dict = {}
final_dict = {}
with open('output_records.csv') as f:
csvreader = csv.DictReader(f)
for row in csvreader:
data.append(row)
for row in data:
if row['key'] not in temp_dict:
temp_dict[row['key']] = list([row['file']])
else:
temp_dict[row['key']].append(row['file'])
for item in temp_dict:
value_dict = {}
for counter, value in enumerate(temp_dict[item]):
key = 'doc_' + str(counter)
value_dict[key] = value
final_dict[item] = value_dict
Thank you in advance for any suggestions.
Ben