I'm trying to create a function that returns a dynamically-named list of columns. Usually I can manually name the list, but I now have 100+ csv files to work with.
My goal:
- Function creates a list, and names it based on dataframe name
- Created list is callable outside of the function
I've done my research, and this answer from an earlier post came very close to helping me.
Here is what I've adapted
def test1(dataframe):
# Using globals() to get dataframe name
df_name = [x for x in globals() if globals()[x] is dataframe][0]
# Creating local dictionary to use exec function
local_dict = {}
# Trying to generate a name for the list, based on input dataframe name
name = 'col_list_' + df_name
exec(name + "=[]", globals(), local_dict)
# So I can call this list outside the function
name = local_dict[name]
for feature in dataframe.columns:
# Append feature/column if >90% of values are missing
if dataframe[feature].isnull().mean() >= 0.9:
name.append(feature)
return name
To ensure the list name changes based on the DataFrame supplied to the function, I named the list using:
name = 'col_list_' + df_name
The problem comes when I try to make this list accessible outside the function:
name = local_dict[name]
.
I cannot find away to assign a dynamic list name to the local dictionary, so I am forced to always call name
outside the function to return the list. I want the list to be named based on the dataframe input (eg. col_list_df1, col_list_df2, col_list_df99).
This answer was very helpful, but it seems specific to variables.
global 'col_list_' + df_name
returns a syntax error.
Any help would be greatly appreciated!