0

I have separated my data into four dataframes (SE_df, SO_df, etc; each matching a specific pattern on one of the columns of the original data). I want to run the same process on each of the four.

I tried something like this (Note: the actual process is longer, but it starts with value_counts.)

lookup_table = {
    'SE': [SE_df, 'SE_df_counts'],
    'SO': [SO_df, 'SO_df_counts'],
    'OC': [OC_df, 'OC_df_counts'],
    'CW': [CW_df, 'CW_df_counts'] 
}

for site in ['SE', 'SO', 'OC', 'CW']:
    dframe_in = lookup_table[site][0]
    dframe_out = lookup_table[site][1]

    dframe_out = dframe_in.apply(pd.value_counts)
    # ... 

When the loop finishes, I want four new DataFrames: SE_df_counts, SO_df_counts, ... etc

Instead, I have one new DataFrame named dframe_out.

I initially tried to use

lookup_table = {
    'SE': [SE_df, SE_df_counts],
    ...
    }

but python complained that a DataFrame named SE_df_counts didn't exist yet.

I tried forcing it to exist (whoch made the code more brittle, but it was worth a shot).

SE_df_counts = pd.DataFrame()
lookup_table = {
    'SE': [df_SE_mic, SE_df_counts],
}

for site in ['SE']:
    dframe_in = lookup_table[site][0]
    dframe_out = lookup_table[site][1]

    dframe_out = dframe_in.apply(pd.value_counts)

and I still ended up with a DataFrame named dframe_out (which really confused me).

Is there a way to pass the desired name of a dataframe as a variable (or a dictionary value)? I see many tempting recommendations here where people say 'Use a dictionary" but the examples are always more complex than what I'm trying to do and the ultimate answers often provide alternate ways around the question. (e.g. this question was very close, but the chosen answer wasn't relevant to my use case.)

Vicki B
  • 544
  • 2
  • 9
  • 20
  • you can use `globals()['df_name'] = df` – luigigi Jan 24 '20 at 06:29
  • so for each of the four dfs you want their respective value counts in new dfs ? – YOLO Jan 24 '20 at 06:31
  • `lookup_table[site].append(dframe_out)` and then access using `lookup_table[site][2]` ? Although a bad data structure for doing this. – Vishnudev Krishnadas Jan 24 '20 at 06:33
  • YOLO >> so for each of the four dfs you want their respective value counts in new dfs? ----- yes; named by the site. – Vicki B Jan 24 '20 at 08:16
  • Vishnudev - That would be an interesting workaround. I had been thinking of stuffing an empty DataFrame into position[2] if there was no other way. ===>> "Although a bad data structure..." -- what would you recommend instead? – Vicki B Jan 24 '20 at 08:18

1 Answers1

0

How about:

lookup_table = {
    'SE': [SE_df, None ],
    'SO': [SO_df, None ],
    'OC': [OC_df, None ],
    'CW': [CW_df, None ] 
}

for site in ['SE', 'SO', 'OC', 'CW']:
    dframe_in = lookup_table[site][0]

    lookup_table[site][1] = dframe_in.apply(pd.value_counts)
RootTwo
  • 4,288
  • 1
  • 11
  • 15
  • Yeah, I've ended up doing something along these lines. ----- I get the Strong Impression from the replies here and elsewhere that the answer to the question "How do I pass the name of a variable using a {string,variable,dictionary value, ...} is "You can't do that in Python. :-( – Vicki B Jan 24 '20 at 08:20