Iterate through list of dataframes, performing calculations on certain columns of each dataframe, resulting in new dataframe of the results

Question

Newbie here. Just as the title says, I have a list of dataframes (each dataframe is a class of students). All dataframes have the same columns. I have made certain columns global.

BINARY_CATEGORIES = ['Gender', 'SPED', '504', 'LAP']

for example. These are yes/no or male/female categories, and I have already changed all of the data to be 1's and 0's for these columns. There are several other columns which I want to ignore as I iterate.

I am trying to accept the list of classes (dataframes) into my function and perform calculations on each dataframe using only my BINARY_CATEGORIES list of columns. This is what I've got, but it isn't making it through all of the classes and/or all of the columns.

def bal_bin_cols(classes):
    
    i = 0
    c = 0
    for x in classes:
       total_binary = classes[c][BINARY_CATEGORIES[i]].sum()
       print(total_binary)
       i+=1  
       c+=1

Eventually I need a new dataframe from this all of the sums corresponding to the categories and the respective classes. print(total binary) is just a place holder/debugger. I don't have that code yet that will populate the dataframe from the results of the above code, but I'd like it to be the classes as the index and the total calculation as the columns.

I know there's probably a vectorized way to do this, or enum, or groupby, but I will take a fix to my loop. I've been stuck forever. Please help.

Given that the current answers seem to be duplicating work you've already done. Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). — Henry Ecker, May 29 '21 at 04:07

score 0 · Answer 1 · answered May 29 '21 at 03:33

0

Try something like:

Firstly create a dictionary:

d={
    'male':1,
    'female':0,
    'yes':1,
    'no':0
}

Finally use replace():

df[BINARY_CATEGORIES]=df[BINARY_CATEGORIES].replace(d.keys(),d.values(),regex=True)

answered May 29 '21 at 03:33

Anurag Dabas

23,866
9
21
41

This is an excellent way to transform the strings to binary data, but I've already done that (in a less elegant way admittedly). I'm looking to make calculations (sum the 1's for instance) using only my list of columns (BINARY_CATEGORIES) on a list of dataframes. Thanks for the response. – dbugg May 29 '21 at 03:51
`df[BINARY_CATEGORIES].sum(1)`? – Anurag Dabas May 29 '21 at 04:14

Iterate through list of dataframes, performing calculations on certain columns of each dataframe, resulting in new dataframe of the results

1 Answers1