How to use pandas to sort data in an excel file. and sort through duplicates

Question

I am trying to filter data using panda.

df_c = pd.read_csv (r'C:\Users\User\Documents\Research\output-mutations.csv')
df_c.drop_duplicates(subset = ["FILENAME", "CHAIN", "MUTATION_CODE"],
                     keep = False, inplace = True)
df_c.to_csv (r'C:\Users\User\Documents\Research\output-mutations-concise.csv')```

Right now, this is all I have. I am trying to remove duplicates in one of my columns, and print all matches in my last column. I have an example of what I'd like to do, but I don't know where to start and what commands to use with panda. I tried the .drop command.

score 0 · Accepted Answer · answered Apr 29 '21 at 06:38

0

You can use groupby and apply instead of drop.

df = df.groupby(by=[ "FILENAME", "CHAIN"],as_index=False).apply(lambda x: ";".join(x["MUTATION_CODE"]))
df.columns = ["FILENAME", "CHAIN", "MUTATION_CODE"]
print(df)

answered Apr 29 '21 at 06:38

Zalak Bhalani

979
7
15

How to use pandas to sort data in an excel file. and sort through duplicates

1 Answers1