1

I am trying to iteratively create subsets of a dataframe. A toy example:

In:

   A  B  participant  
0  1  3            1          
1  2  4            1         
2  5  8            2          
3  4  9            2
4  3  7            3

(The conditional statement thanks to the commenter below)

for p in df:
    subset = df[df['participant'] == p].loc[: , 'A']

The desired outcome is:

   A  participant  
0  1            1          
1  2            1

   A  participant  
0  5            2          
1  4            2   

etc.

But the for loop makes a subset by row, not by participant. How to get subsets per participant?

original attempt:

for p in df:
    p.pressure = df[(:, 'pressure') & (df['participant'] == p)]
MeC
  • 463
  • 3
  • 17
  • Ok, thank you but it looks like now I'm looping over column titles instead of speakers now? – MeC Mar 12 '18 at 19:47
  • You _could_ change it to `for p in df['participant']` but I feel like this is an [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). What is it that you're trying to do? There's likely an easier solution, maybe using `groupby()`. – pault Mar 12 '18 at 19:49
  • I want to know how to iteratively subset over a dataframe using the same criteria each time. In this example, the for loop should give subsets of all the pressure values for each participant. Current syntax gives duplicate subsets. – MeC Mar 12 '18 at 20:14
  • Can you give a concrete example with sample inputs and desired outputs? Try to provide a [mcve]. More on [how to create good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – pault Mar 12 '18 at 20:23
  • 1
    question edited with toy dataframe above – MeC Mar 12 '18 at 20:36

1 Answers1

2

Here is one way.

First get the unique values for participants:

participants = df['participant'].unique()
#array([1, 2, 3])

Now create a dataframe for each participant. In this example, I will store each DF in a dictionary, keyed by the participant number.

output_dfs = {p: df[df['participant'] == p] for p in participants}
for p in output_dfs:
    print("Participant = %s"%p)
    print(output_dfs[p])
    print("")

Which prints:

Participant = 1
   A  B  participant
0  1  3            1
1  2  4            1

Participant = 2
   A  B  participant
2  5  8            2
3  4  9            2

Participant = 3
   A  B  participant
4  3  7            3
pault
  • 41,343
  • 15
  • 107
  • 149