0

I am trying to filter data using panda.

df_c = pd.read_csv (r'C:\Users\User\Documents\Research\output-mutations.csv')
df_c.drop_duplicates(subset = ["FILENAME", "CHAIN", "MUTATION_CODE"],
                     keep = False, inplace = True)
df_c.to_csv (r'C:\Users\User\Documents\Research\output-mutations-concise.csv')```

Right now, this is all I have. I am trying to remove duplicates in one of my columns, and print all matches in my last column. I have an example of what I'd like to do, but I don't know where to start and what commands to use with panda. I tried the .drop command.

example

jack22321
  • 75
  • 6

1 Answers1

0

You can use groupby and apply instead of drop.

df = df.groupby(by=[ "FILENAME", "CHAIN"],as_index=False).apply(lambda x: ";".join(x["MUTATION_CODE"]))
df.columns = ["FILENAME", "CHAIN", "MUTATION_CODE"]
print(df)
Zalak Bhalani
  • 979
  • 7
  • 15