Is there a way to create two dataFrames of two types of xlsx docs from my cwd

Question

I want to perform a analysis on 24 excel doc's 12 containing one side of the story and the other 12 the other side. I managed to load them into python but when i try to get them in two seperate datafranes python combines them back to one.

This is for a windows server using Python3.7

import pandas as pd
import os

path = os.getcwd()

files = os.listdir(path)

files

files_car = [f for f in files if f.startswith("CAR")]

files_car

for f in files_car:

    data1 = pd.read_excel(f)

    car = car.append(data1)

path = os.getcwd()

files2 = os.listdir(path)

files2

files_ean = [f for f in files2 if f.startswith("ELEK")]

files_ean

ean = pd.DataFrame()

for x in files_ean:

    data2 = pd.read_excel(f)

    ean = ean.append(data2)

i expected that files_car would contain the 12 files tht start with "CAR" and that files_ean the 12 files that start with "ELEK"

are you trying to get two lists of DataFrames? your `files_car` and `files_ean` are lists of file paths. it sounds like you want to iterate over these two lists and create two new lists of DataFrames created from reading in the files located at the respective paths. is that correct? — dan_g, Jun 19 '19 at 16:57
Yes, so i'm trying to iterate over the list and to get an output in 2 separate dataframe one with the files of files_car and one with the files of files_ean. — cryptoguyx2, Jun 19 '19 at 17:18
Possible duplicate of [Import multiple csv files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) — dan_g, Jun 19 '19 at 17:38

score 0 · Accepted Answer · answered Jun 19 '19 at 17:35

This should do what you want, based on your comment:

import pandas as pd
import os

path = os.getcwd()
files = os.listdir(path)

car_dfs = [pd.read_excel(f) for f in files if f.startswith("CAR")]
ean_dfs = [pd.read_excel(f) for f in files if f.startswith("ELEK")]

car_df = pd.concat(files_car)
ean_df = pd.concat(files_ean)

Couple of points:

you don't need to recreate files if you're going to run the same command (i.e. os.listdir(path)) unless path were to change
you should not append to a DataFrame in a loop like you're doing, since this creates a copy every time you call append. It's better to create a list of DataFrames and then concatenate that list in to one big DataFrame.

You could shorten this even more by just doing:

import pandas as pd
import os

path = os.getcwd()
files = os.listdir(path)

car_df = pd.concat([pd.read_excel(f) for f in files if f.startswith("CAR")])
ean_df = pd.concat([pd.read_excel(f) for f in files if f.startswith("ELEK")])

unless you have some need for the individual file DataFrames

Got it, this helped a lot! Thanks for the support, been wrestling with this for hours — cryptoguyx2, Jun 19 '19 at 18:19
No problem - I had a very similar [question](https://stackoverflow.com/questions/25210819/speeding-up-data-import-function-pandas-and-appending-to-dataframe) when I first started working with `pandas`. FYI, if this answers your question you're encouraged to mark your question as answered. — dan_g, Jun 19 '19 at 20:35

Is there a way to create two dataFrames of two types of xlsx docs from my cwd

1 Answers1