I have a pretty simple need that has come up in a couple other posts, but I'm not sure if a better way to approach it is with a groupby
or duplicated
method.
I have what I need below with duplicated
except the first duplicate is being flagged as FALSE
instead of TRUE
. I need all duplicates as TRUE.
My goal is to be able to concatenate data from two columns together when it's a duplicate, otherwise, leave the data as-is.
Sample Input:
ID File Name
1 Text.csv
2 TEXT.csv
3 unique.csv
4 unique2.csv
5 text.csv
Desired Output:
ID File Name LowerFileName Duplicate UniqueFileName
1 Text.csv text.csv TRUE 1Text.csv
2 TEXT.csv text.csv TRUE 2TEXT.csv
3 unique.csv unique.csv FALSE unique.csv
4 unique2.csv unique2.csv FALSE unique2.csv
5 text.csv text.csv TRUE 5text.csv
df_attachment = pd.read_csv("Attachment.csv")
df_attachment['LowerFileName'] = df_attachment['File Name'].str.lower()
df_attachment['Duplicate'] = df_attachment.duplicated('LowerFileName')
#This syntax is incorrect
df_attachment['UniqueFileName'] = np.where(df_attachment['Duplicate']=='TRUE', pd.concat(df_attachment['ID'],df_attachment['File Name']), df_attachment['File Name'))