I have made a loop where I iterate over (csv) files in a folder, read them into a dictionary of dataframes and name them after the csv file (e.g. file1.csv becomes file1_df). I do some work on the data and generate new rows, then I try to subset part of my dataframes into a new dataframe (file1_df2). I would like to later reference these dataframes outside of the dictionary.
df_dict = {}
for file in os.listdir(datadir): # Loop over the files in that folder (only has CSV files)
df_name = file[:-4] + '_df' # Trim off .csv to name the dataframe
df_dict[df_name] = pd.read_csv(os.path.join(datadir, file))
Is it possible to reference these dataframes by name? So later I can just call file1_df2
instead of df_dict["file1_df2"]
?
In essence I am asking the same question as here. It doesn't look like he got this answered either, so I think this might not be possible, but I have yet to find an answer that explicitly says it isn't.
I know this is possible in languages like SAS and Stata, but I have never figured out how to do it in Python. In those languages, you can plug your placeholder variable directly into the name of something.
/* In SAS */
%let param = test1
libname path "C:\User\¶m."
proc sql;
create ¶m._df as
select * from path.¶m.
quit;
/* In Stata */
foreach i in file1 file2 {
import delimited "`i'.csv", clear
save "`i'.dta", replace
}
etc. If this is not possible, I would like to know that with certainty. Thank you!