2

I have a list of csv file pathnames in a list, and I am trying to save them as dataframes. How can I do it?

import pandas as pd
import os
import glob

# use glob to get all the csv files
# in the folder
path = "/Users/azmath/Library/CloudStorage/OneDrive-Personal/Projects/LESA/2022 HY/All"
csv_files = glob.glob(os.path.join(path, "*.xlsx"))
  
# loop over the list of csv files
for f in csv_files:
    
    # read the csv file
    df = pd.read_excel(f)  
    display(df)
    print()

The issue is that it only prints. but I dont know how to save. I would like to save all the data frames as variables, preferably as their file names.

buhtz
  • 10,774
  • 18
  • 76
  • 149
azmath
  • 97
  • 6
  • 3
    Take a look at [`.to_excel`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html) and [`.to_csv`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html) methods of a DataFrame – Mortz Jul 19 '22 at 11:12
  • You just wanna open and save it the way it is? Or have to do some alterations? Could you provide us with more info? – Hiago Reis Jul 19 '22 at 11:13

2 Answers2

2

By “save” I think you mean store dataframes in variables. I would use a dictionary for this instead of separate variables.

import os


data = {}

for f in csv_files:
    name = os.path.basename(f)

    # read the csv file
    data[name] = pd.read_excel(f)  
    display(data[name])
    print()

Now all your dataframes are stored in the data dictionary where you can iterate them (easily handle all of them together if needed). Their key in the dictionary is the basename (filename) of the input file.

Recall that dictionaries remember insertion order, so the order the files were inserted is also preserved. I'd probably recommend sorting the input files before parsing - this way you get a reproducible script and sequence of operations!

creanion
  • 2,319
  • 2
  • 13
  • 17
  • Thank you very much. It saved as dictionary. But I wanted to save as individual dataframes. Is there anyway to do it? – azmath Jul 19 '22 at 11:24
  • And the problem is it doesn't save all the tabs in the excel file – azmath Jul 19 '22 at 11:46
  • saving all sheets is a separate question. The individual dataframes are in the dictionary. Can you be more specific about what's wanted? Individual variable names for them? – creanion Jul 20 '22 at 09:50
2

try this:

a = [pd.read_excel(file) for file in csv_files]

Then a will be a list of all your dataframe. If you want a dictionary instead of list:

a = {file: pd.read_csv(file) for file in csv_files}
bpfrd
  • 945
  • 3
  • 11
  • Thanks, but it doesn't save all the sheets (tabs) in the excel file – azmath Jul 19 '22 at 11:48
  • 1
    if you have multiple sheets, you should follow the instructions in here: https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook – bpfrd Jul 19 '22 at 11:50