2

I'm trying to loop through a data frame created by pandas, looking for each value that only occurs once in the frame. My code so far is the following:

import pandas as pd
df = pd.read_csv('xyz.csv')
saved_column = df['S07'][df['Class'].isin(['GTD'])].round(decimals=1).value_counts()

How can I loop through this data frame, detecting all values that occur only once and ultimately delete them from the csv-file?

Thank you very much in advance for your help!

An example would be: (Input data in csv-file)

In [2]: df
Out[2]: Class  S07
         GTD   2.23
         GTD   2.21
         GTD   1.82
         GTD   2.26

I want the code to delete the line with GTD - 1.82, since its rounded value (1.8) only occurs once within the dataset.

Sample dataset: https://1drv.ms/u/s!AvuwPSn7axNcePUsJD8kMB1FnlE

Phil
  • 23
  • 3
  • 3
    can you post a sample data set and desired data set? Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Feb 23 '17 at 21:41

1 Answers1

0

you can use duplicated() method:

In [86]: df
Out[86]:
  Class   S07
0   AAA  1.10
1   AAA  1.11
2   GTD  2.23
3   GTD  2.21
4   GTD  1.82
5   GTD  2.26

In [87]: x = df.loc[df.Class.isin(['GTD']), 'S07'].round(1).duplicated(keep=False)

In [88]: df.loc[df.index[x.index][x]]
Out[88]:
  Class   S07
2   GTD  2.23
3   GTD  2.21

Now you can save results into a new CSV file:

df.loc[df.index[x.index][x]].to_csv('/path/to/file.csv', index=False, ...)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks for your quick reply. However, I get the following error message: Unalignable boolean Series key provided. Thank you for your patience... – Phil Feb 23 '17 at 22:06
  • @Phil, you are welcome. Please consider [accepting](http://meta.stackexchange.com/a/5235) an answer if you think it has answered your question – MaxU - stand with Ukraine Feb 23 '17 at 22:51
  • Is there any simple way to actually keep the line but just deleting the value (1.82)? ... Ultimately the line should read GTD without the value. – Phil Feb 24 '17 at 19:41