1

I have a Pandas dataframe with about 600 rows, with one column called "PAGE_NAME" that contains 8 unique string values. These are the 8 unique string values in this column:

my_list_of_strings = ['Demographics', 'SummaryMeasuresOfHealth', 'LeadingCausesOfDeath', 'MeasuresOfBirthAndDeath', 'RelativeHealthImportance', 'VunerablePopsAndEnvHealth', 'PreventiveServicesUse', 'RiskFactorsAndAccessToCare']

There are 6 other columns in this dataframe.

What I'd like to do is create 8 new dataframes, one for each of these strings, where each of the 8 new dataframes will include just the rows where a given string is in the "PAGE_NAME" column.

I would like to assign each of the 8 new dataframes a variable name that includes the string: something like Demographics_df, SummaryMeasuresOfHealth_df, etc....

I was able to write a function (below) that creates a list of the dataframes, but (1) I don't know how to extract the 8 separate dataframes and (2) I don't know how to give them names with the appropriate string as part of of the variable name.

def make_pagename_dataframes(page_name_list):
    list_of_dfs = []
    for i in page_name_list:
list_of_dfs.append(original_df.loc[original_df['PAGE_NAME'] == i])
    return list_of_dfs

list_of_new_dfs = make_pagename_dataframes(my_list_of_strings)
TJE
  • 570
  • 1
  • 5
  • 20

1 Answers1

2

you can do this with groupby

dict_of_dfs = {k: v for k, v in original_df.groupby('PAGE_NAME')}

Or a list of them

list_of_dfs = [v for k, v in original_df.groupby('PAGE_NAME')]

Get your 8 dataframes. Mind you, I'm going to put a little extra just in case you have more than the unique number of strings you say you do. However, this will fail if you don't have at least 8 unique strings.

d1, d2, d3, d4, d5, d6, d7, d8, *therest = (
    v for k, v in original_df.groupby('PAGE_NAME')
)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks for the input. Is it possible to extract the 8 separate dataframes from the dict_of_dfs, so that I have 8 distinct dataframe objects? – TJE Mar 23 '18 at 05:15
  • Last question: is there an easy way to assign the string names from my list of 8 to be the dataframe names? So instead of d1, d2, d3, d4, d5, d6, d7, d8, *therest = (v for k, v in original_df.groupby('PAGE_NAME'))..., it would be something like Demographics_df, SummaryMeasuresOfHealth_df, LeadingCausesOfDeath_df, MeasuresOfBirthAndDeath_df, RelativeHealthImportance_df, VunerablePopsAndEnvHealth_df, PreventiveServicesUse_df, RiskFactorsAndAccessToCare_df, *therest = = (v for k, v in original_df.groupby('PAGE_NAME')) – TJE Mar 23 '18 at 05:42
  • 1
    This is commonly referred to as dynamically naming your variables and generally considered bad practice. The best approach is to use the first option in which you store them in a dictionary and refer to them via referencing their keys. In essence, this is like using the variable names with a namespace in which the dictionary represents the namespace. And this is much cleaner and considered best practice. See https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables – piRSquared Mar 23 '18 at 05:47