python iterating on multiple files

Question

I have

file_2000.dta, file_2001.dta, file_2002.dta and so on.

I also have

file1_2000.dta, file1_2001.dta, file1_2002.dta and so on.

I want to iterate on the file year.

Let (year) = 2000, 2001, 2002, etc

import file_(year) using pandas. 
import file1_(year) using pandas. 

file_(year)['name'] = file_(year).index
file1_(year)['name'] = file1_(year).index2

merged = pd.merge(file_(year), file1_(year), on='name') 

write/export merged_(year).dta

it is not really clear to me what you want to achieve – TiTo Jul 01 '20 at 13:55 — TiTo, Jul 01 '20 at 13:55
I've edited to add some simplified code – LB2015 Jul 01 '20 at 14:06 — LB2015, Jul 01 '20 at 14:06

score 0 · Answer 1 · 2020-07-01T14:54:45.453

As much as I know there is not 'Let' keyword in Python. To iterate over multiple files in a directory you can simply use for loop with os module like the following:

import os

directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.startswith("file_200") and filename.endswith(".dat"):
        # do something
    else:
        continue

Another approach is to use regex to tell python the files names to match during the iteration. the pattern should be: pattern = r"file_20\d+"

score 0 · Answer 2 · answered Jul 01 '20 at 14:28

It seems to me that you need to use the read_stata function, based on your .dta extensions, to read the files in a loop, create a list of the separate dataframes to be able to work with them separately, and then concatenate all dataframes into one.

Something like:

list_of_files = ['file_2000.dta', 'file_2001.dta', 'file_2002.dta']  # full paths here...

frames = []

for f in list_of_files:
    df = pd.read_stata(f)
    frames.append(df)

consolidated_df = pd.concat(frames, axis=0, ignore_index=True)

These questions might be relevant to your case:

How to Read multiple files in Python for Pandas separate dataframes

Pandas read_stata() with large .dta files

python iterating on multiple files

2 Answers2