I am trying to remove duplicates data in my dataframe (csv) and get a separate csv to show the unique answers of each column. The problem is that my code has been running for a day (22 Hours to be exact) I´m open to some other suggestions.
My data has about 20,000 rows with headers (example). I have tried to check the unique list one by one before like df[col].unique() and it does not take that long.
df = pd.read_csv('Surveydata.csv')
df_uni = df.apply(lambda col: col.drop_duplicates().reset_index(drop=True))
df_uni.to_csv('Surveydata_unique.csv', index=False)
What I expect is the dataframe that has the same set of columns but without any duplication in each field (example). Ex. if df['Rmoisture'] has a combination of Yes,No,Nan it should have only these 3 contain in the same column of another dataframe df_uni.