0

I have written a function that manually creates separate dataframes for each participant in the main dataframe. However, I'm trying to write it so that it's more automated as participants will be added to the dataframe in the future.

My original function:

def separate_participants(main_df):
    S001 = main_df[main_df['participant'] == 'S001']
    S001.name = "S001"
    S002 = main_df[main_df['participant'] == 'S002']
    S002.name = "S002"
    S003 = main_df[main_df['participant'] == 'S003']
    S003.name = "S003"
    S004 = main_df[main_df['participant'] == 'S004']
    S004.name = "S004"
    S005 = main_df[main_df['participant'] == 'S005']
    S005.name = "S005"
    S006 = main_df[main_df['participant'] == 'S006']
    S006.name = "S006"
    S007 = main_df[main_df['participant'] == 'S007']
    S007.name = "S007"

    participants = (S001, S002, S003, S004, S005, S006, S007)
    participant_names = (S001.name, S002.name, S003.name, S004.name, S005.name, S006.name, S007.name)

    return participants, participant_names

However, when I try and change this I get a KeyError for the name of the participant in the main_df. The code is as follows:

def separate_participants(main_df):
    participant_list = list(main_df.participant.unique())
    participants = []

    for participant in participant_list:
        name = participant
        temp_df = main_df[main_df[participant] == participant]
        name = temp_df

        participants.append(name)

    return participants

The error I get: KeyError: 'S001'

I can't seem to figure out what I'm doing wrong, that means it works in the old function but not the new one. The length of the objects in the dataframe and the list are the same (4) so there are no extra characters.

Any help/pointers would be greatly appreciated!

Iguananaut
  • 21,810
  • 5
  • 50
  • 63
ShrutiTurner
  • 174
  • 2
  • 14
  • 6
    Your `DataFrame` has a column named `'paricipant'` but you're indexing it with the value of the variable `participant` which is presumably not a column in your DataFrame. You probably wanted `main_df['participant']`. Most likely the `KeyError` came with a "traceback" leading back to the line `temp_df = main_df[main_df[participant] == participant]` which suggests you should examine it closely. – Iguananaut Dec 11 '19 at 14:11
  • Such a stupid error on my part - thank you so much! That's only been 2 hours of my day. – ShrutiTurner Dec 11 '19 at 14:14
  • don't use dot notation for assignment, always use bracket notation see [this link](https://stackoverflow.com/a/35850628/9375102) – Umar.H Dec 11 '19 at 14:17
  • Learn to use pdb. If you ran `python -m pdb `, it would break on your `KeyError`. That would allow you to examine all the local variables, try re-running that line of code yourself, and you would probably quickly understand what was wrong. See [some tips on debugging small Python scripts](https://stackoverflow.com/a/52354973/982257) and [How to debug small programs](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/). Debugging is a skill to learn and won't happen overnight, but if you practice, a problem like this will take seconds to solve, not hours. – Iguananaut Dec 11 '19 at 14:19
  • This ```temp_df = main_df[main_df[participant] == participant]``` is the same as ```temp_df = main_df[True]``` and that is certainly not what you want. Try to replace it with ```temp_df = main_df[participant]```. – accdias Dec 11 '19 at 14:20
  • 1
    @accdias Sorry, but that isn't the case here. – Iguananaut Dec 11 '19 at 14:21

1 Answers1

1

Thanks @Iguananaut for the answer:

Your DataFrame has a column named 'participant' but you're indexing it with the value of the variable participant which is presumably not a column in your DataFrame. You probably wanted main_df['participant']. Most likely the KeyError came with a "traceback" leading back to the line temp_df = main_df[main_df[participant] == participant] which suggests you should examine it closely.

ShrutiTurner
  • 174
  • 2
  • 14