0

I have over 20 SAS (sas7bdat) files all with same columns I want to read in Python. I need an iterative process to read all the files and rbind into one big df. This is what I have so far, but it throws an error saying no objects to concatenate.

import pyreadstat
import glob
import os

path = r'C:\\Users\myfolder'  # or unix / linux / mac path
all_files = glob.glob(os.path.join(path , "/*.sas7bdat"))

li = []

for filename in all_files:
    reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sas7bdat, filename, chunksize= 10000, usecols=cols)
    for df, meta in reader:
        li.append(df)
    frame = pd.concat(li, axis=0)

I found this answer to read in csv files helpful: Import multiple CSV files into pandas and concatenate into one DataFrame

AAA
  • 332
  • 1
  • 10
  • 1
    What is wrong with posted code? Error? Undesired result? Look into `pandas.concat` to row bin list of DataFrames. Maybe `pandas.concat(df for df, meta in reader)`? – Parfait Jan 22 '23 at 04:15
  • 1
    Maybe initialize an empty list to hold all dataframes, append each dataframe to that list inside the loop over all your files, and then pass that list to `pd.concat()`. How huge are your files? – AlexK Jan 22 '23 at 08:26
  • @AlexK Tried it, it won't work. All files put together are around 25 GB. – AAA Jan 23 '23 at 19:41
  • 2
    Do you have that much RAM? Can't test your code, but have you tried to debug? Are you able to read each individual file? What does `li` contain before you pass it to `pd.concat`? Also, your last line should be outside the outer loop. – AlexK Jan 23 '23 at 22:01

1 Answers1

1

So if one has too big sas data files and plans to append all of them into one df then:

#chunksize command avoids the RAM from crashing...
for filename in all_files:
    reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sas7bdat, filename, chunksize= 10000, usecols=cols)
    for df, meta in reader:
        li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
AAA
  • 332
  • 1
  • 10