0

This is example of the data I have

1, "dep, anxiety", 30 
2, "dep"         , 40
4, "stress"      , 30
7, "dep, fobia"  , 20

I want to use pandas to filter rows having "dep" and save it in a new cvs file. output should be:

1, "dep, anxiety", 30
7, "dep, fobia"  , 20
2, "dep"         , 40

this is my code:

import pandas as pd
patients =  pd.read_csv("patients.csv", encoding='latin-1')

print(patients["dep" in patients["qual"]])

that has the following error

"return self._engine.get_loc(self._maybe_cast_indexer(key))"

And I do not know how to export the extracted data to new csv file.

Mary
  • 1,142
  • 1
  • 16
  • 37
  • Possible duplicate of [Pandas writing dataframe to CSV file](http://stackoverflow.com/questions/16923281/pandas-writing-dataframe-to-csv-file) – Merlin Aug 05 '16 at 21:32
  • 1
    try this: `patients[patients.qual.str.contains('dep')].to_csv('c:/temp/dep.csv', index=False)` – MaxU - stand with Ukraine Aug 05 '16 at 21:45
  • Not a duplicate. The OP wants to know how to extract the rows containing `'dep'` before writing to CSV. – Kartik Aug 05 '16 at 23:04
  • @MaxU, that's an answer ;). I think newer Pandas versions also support passing `lambda`s like this: `df[lambda row: "dep" in row["qual"]]`, can't try this out currently, though. – filmor Aug 06 '16 at 10:11
  • @filmor, yes, newer pandas versions do support indexing by callable, but i think `patients.qual.str.contains('dep')` should be faster on bigger data sets... – MaxU - stand with Ukraine Aug 06 '16 at 10:36
  • @MaxU, thank you so much ! I have another question, What about if I want to add another condition to extract a set of data from the cdv file. From the last column, I also want to extract rows having "30". I tried this code: 'patients[patients.dis.str.contains('dep') & patients.rank == '30' ]', but it did not work. – Mary Aug 08 '16 at 03:04

1 Answers1

1

you can do it this way:

In [213]: patients
Out[213]:
   ID           dis  rank
0   1  dep, anxiety    30
1   2           dep    40
2   4        stress    30
3   7    dep, fobia    20

In [214]: patients[(patients['dis'].str.contains('dep')) & (patients['rank'] == 30)]
Out[214]:
   ID           dis  rank
0   1  dep, anxiety    30

PS rank is a pandas method, so you can't use dot-column (df.column_name) accessor, because pandas will think you are calling the NDFrame.rank method:

Demo:

Here we call a reference to the NDFrame.rank method:

In [215]: patients.rank
Out[215]:
<bound method NDFrame.rank of    ID           dis  rank
0   1  dep, anxiety    30
1   2           dep    40
2   4        stress    30
3   7    dep, fobia    20>

Here we call rank column:

In [216]: patients['rank']
Out[216]:
0    30
1    40
2    30
3    20
Name: rank, dtype: int64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419