I have many features that I need to vectorize and apply functions down the road. Rather than manually making copies of each DF, filtering down, then applying my various functions, I'd rather dynamically create new DF's based on the values contained with specified columns.
Take my code down below for example. I'd like to take Column B and create three new dataframes. df_A, df_B, df_C.
I've scoured a few dozen posts, but these are the closest I can find: Create new dataframe in pandas with dynamic names also add new column I wasn't able to get this to work, throwing an error at this bit
dict_of_df[key_name] = copy.deepcopy(df)
TypeError: unhashable type: 'numpy.ndarray'
https://datascience.stackexchange.com/questions/29825/create-new-data-frames-from-existing-data-frame-based-on-unique-column-values I don't want to print lists, I want actual dataframes.
Here's some different code I tried to put together, but is throwing an error with the range function, though I'm not sure why...
import pandas as pd
data = {'Column A': [100,200,300,400,500],
'Column B': ["A","A","B","B","C"]}
df = pd.DataFrame(data, columns=['Column A','Column B'])
df
for i in range(len(df['Column B'].unique())):
for item in df['Column B'].unique():
new_df[i] = df[df['Column B'] == item]
new_df
ValueError: Wrong number of items passed 2, placement implies 1
EDIT: Per the link that @jezrael provided in this post (the duplicate), his solution in that post met the need:
for i, x in df.groupby('Column B'):
globals()['dataframe' + i] = x