I have scores of SAS dataset I want to export to pandas dataframe. The saspy module has a sd2fd method for this purpose. The issue I am having is described by this SO post which has links explaining why strings can not be substituted and used as variable names when executing code.
I'm defining the mk_df
function to call the sd2fd
method and then using a dictionary to pass the key/value pairs.
import os
import glob
from pathlib import Path
import saspy
import pandas as pd
p = Path('/home/trb/sasdata/export_2_df')
sas_datasets = []
df_names = []
pya_tables = []
sep = '.'
for i in p.rglob('*.sas7bdat'):
sas_datasets.append(i.name.split(sep,1)[0])
df_names.append('df_' + i.name.split(sep,1)[0])
sd_2_df_dict = dict(zip(sas_datasets,df_names))
sas = saspy.SASsession(results='HTML')
Returning:
Using SAS Config named: default
SAS Connection established. Subprocess id is 27752
Code continues...
# tell sas where to find the dataset
sas_code='''
libname out_df "~/sasdata/export_2_df/";
'''
libref = sas.submit(sas_code)
# define the mk_df function
def mk_df(sas_name, df_name):
df_name = sas.sd2df(table = sas_name, libref = 'out_df', method='CSV')
return df_name
# call the mk_df function
for key, value in sd_2_df_dict.items():
print(key, value)
mk_df(key, value)
Returns:
cars df_cars
failure df_failure
airline df_airline
prdsale df_prdsale
retail df_retail
stocks df_stocks
However, none of the dataframes are created.
print(df_cars)
NameError Traceback (most recent call last)
<ipython-input-18-aa21e263bad6> in <module>()
----> 1 print(df_cars)
NameError: name 'df_cars' is not defined
I verified the mk_df
function works:
mk_df('stocks', 'df_stocks')
Stock Date Open High Low Close Volume AdjClose
0 IBM 2005-12-01 89.15 89.92 81.56 82.20 5976252.0 81.37
1 IBM 2005-11-01 81.85 89.94 80.64 88.90 5556471.0 88.01
2 IBM 2005-10-03 80.22 84.60 78.70 81.88 7019666.0 80.86
3 IBM 2005-09-01 80.16 82.11 76.93 80.22 5772280.0 79.22
4 IBM 2005-08-01 83.00 84.20 79.87 80.62 4801386.0 79.62
Printing the key
and value
return strings:
print(key, value)
stocks df_stocks
How do I iterate the call to the mk_df
function? Or is there a different approach I should consider?
@Python R SAS, that is a helpful observation. So I changed the mk_df
function to include more information and make an attempt to explicitly name the output DataFrame.
def mk_df(sas_name, out_df):
out_df = sas.sd2df(table = sas_name, libref = 'out_df', method='CSV')
out_df.df_name = out_df
name =[x for x in globals() if globals()[x] is out_df]
print("Dataframe Name is: ", name, "Type: ", type(out_df))
return out_df
The call to the function is now:
j = 0
for key, value in sd_2_df_dict.items():
mk_df(key, value).name=df_names[j]
j += 1
Returns:
/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
This is separate from the ipykernel package so we can avoid doing imports until
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>
Dataframe Name is: [] Type: <class 'pandas.core.frame.DataFrame'>