0

I feel like I am asking a very silly question that has been asked a thousand times but I cannot seem to find it anywhere. I might be using the wrong terminology.

Anyway, I have a pandas frame df. And I would like to use a part of this dataframe. More specifically I'd like to use it in a loop:

unique_values = df['my_column'].tolist()
unique_values = list(set(unique_values))

for value in unique_values:
    tempDf = df[df['my_column] == value]
    # Do stuff with tempDf

But this doesn't seem to work. Is there another way to 'filter' a dataframe by a column's value?

ayhan
  • 70,170
  • 20
  • 182
  • 203
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • 3
    Damn elitists, down-voting because some one asks a basic question. At least comment to tell me what's wrong. – Bram Vanroy Aug 08 '16 at 18:06
  • in what way is it not working? – Paul H Aug 08 '16 at 18:08
  • @BramVanroy I didn't down vote you. but I had an inclination to. The reason being that this question seems vague. We commonly ask for questions to adhere to the MCVE standard (http://stackoverflow.com/help/mcve). Sometimes the tone of how you've asked coupled with the fact that there is no direct solution we can compare against and who knows what else others are thinking, can lead to a down vote. I'd stop to consider answering this question if it appeared straight forward to answer. In this case, you are requiring that the answerer do all the work to come up with an example for you. – piRSquared Aug 08 '16 at 18:19
  • @BramVanroy here are some answers to a question on how to ask a pandas question. http://stackoverflow.com/a/20159305/2336654; http://stackoverflow.com/a/38466059/2336654. These can give additional insight as to why people may have down voted this question. – piRSquared Aug 08 '16 at 18:22
  • @BramVanroy, I didnt downvote, with the reputation in 8300. SO ppl would expect a cleaner question. – Merlin Aug 08 '16 at 18:39
  • I guess I could have provided a cleaner question in that I could show where things went south. I apologise for that. I did my best to provide much more information in [a follow-up question](http://stackoverflow.com/questions/38838764/merging-crosstabs-in-python). – Bram Vanroy Aug 08 '16 at 21:28

2 Answers2

3

Use df.groupby instead:

for value, tempDf in df.groupby('my_column'):
    # Do stuff with tempDf

You code does work, after fixing a missing single quote around 'my_column, but would be slower than using df.groupby.

Evaluating df['my_column'] == value in a loop forces Pandas to run through len(df) comparisons for each iteration of the loop. df.groupby partitions the DataFrame into groups with one pass through the DataFrame.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    Sorry I didn't notice you mentioned the error in your answer and assumed it was just a typo in the question, not in the actual code. But that assumption may not be true so I rolled it back. – ayhan Aug 08 '16 at 18:11
  • It's okay either way. But it is the only error I see in the OP's code. – unutbu Aug 08 '16 at 18:12
0
for value in unique_values:
    tempDf = df.where(df['column_name'] == value)
    # Do stuff with tempDf

Additionally you could use a query statement

for value in unique_values:
     tempDf = df.query('(column_name == value)')
     # Do stuff with tempDf

Or you could do

for value in unique_values:
         tempDf = df[df['my_column] == value]
         tempDf = tempDf .query('(value == True)')
         # Do stuff with tempDf

Although the last one seems inefficient

Kalimantan
  • 702
  • 1
  • 9
  • 28