Python Pandas Read multiple SAS files from a list into separate dataframes

Question

I'm reading a bunch of SAS files like so:

demography = pd.read_sas("demography.sas7bdat", encoding = 'latin-1') adverse_event_ds = pd.read_sas("adverse_event_ds.sas7bdat", encoding = 'latin-1') rpt10344 = pd.read_sas("rpt10344.sas7bdat", encoding = 'latin-1') vaccine_administration = pd.read_sas("vaccine_administration.sas7bdat", encoding = 'latin-1') lab_tests_blood_chemistry_ds = pd.read_sas("lab_tests_blood_chemistry_ds.sas7bdat", encoding = 'latin-1') lab_tests_hematology_ds = pd.read_sas("lab_tests_hematology_ds.sas7bdat", encoding = 'latin-1') lab_tests_miscellaneous_ds = pd.read_sas("lab_tests_miscellaneous_ds.sas7bdat", encoding = 'latin-1') vital_signs = pd.read_sas("vital_signs.sas7bdat", encoding = 'latin-1')

I want to be able to replace it with something like this:

datasets = ["demography", "adverse_event_ds", "rpt10344", "vaccine_administration", "lab_tests_blood_chemistry_ds", "lab_tests_hematology_ds", "lab_tests_miscellaneous_ds", "vital_signs"]

for dataset in datasets: dataset = pd.read_sas(dataset+".sas7bdat", encoding = 'latin-1')

But when I do something like: demography.info()

I get: NameError: name 'demography' is not defined

What's happening under the hood and how can I fix this?

tbdees · Accepted Answer · 2018-10-19T20:29:08.980

2

this is assigning to dataset on every iteration rather than creating the new variables (e. g. demography, rpt10344, etc).

i'd use a dataset dictionary as follows:

dsd = {}
for dataset in datasets:
    dsd[dataset] = pd.read_sas(dataset+".sas7bdat", encoding = 'latin-1')

or a more pythonic route:

dsd = { d : pd.read_sas(d + ".sas7bdat", encoding = 'latin-1') for d in datasets }

I'd strongly advise against assigning to individual variable names for reasons explained here and here but if you absolutely must you can use

for d in datasets:
    globals()[d] = pd.read_sas(d + ".sas7bdat", encoding = 'latin-1')

edited Oct 19 '18 at 20:29

answered Oct 19 '18 at 14:54

tbdees

76
4

Thanks! This did help, but how can I have the dataframes assigned directly to the name, rather than having to do something like `dsd["demography"]`? – TheCuriouslyCodingFoxah Oct 19 '18 at 16:33
1

It's better practice to have the dataframes in a container like a dictionary: https://stackoverflow.com/a/6365889/5666087 – jkr Oct 19 '18 at 20:33
Thanks for that and the links! This really helped explain why it's a bad idea! Thanks a lot! – TheCuriouslyCodingFoxah Oct 20 '18 at 11:53
no prob! please consider accepting my answer if it worked for you. – tbdees Oct 20 '18 at 14:08
@tbdees - I hit a +1 on your answer, does that count as accepting or am I doing it wrong? – TheCuriouslyCodingFoxah Oct 24 '18 at 14:57
@TheCuriouslyCodingFoxah if you click the checkmark it should change color - more information on that here: https://stackoverflow.com/help/someone-answers :) – tbdees Oct 24 '18 at 22:19

Python Pandas Read multiple SAS files from a list into separate dataframes

1 Answers1