-1

I have

file_2000.dta, file_2001.dta, file_2002.dta and so on.

I also have

file1_2000.dta, file1_2001.dta, file1_2002.dta and so on.

I want to iterate on the file year.

Let (year) = 2000, 2001, 2002, etc

import file_(year) using pandas. 
import file1_(year) using pandas. 

file_(year)['name'] = file_(year).index
file1_(year)['name'] = file1_(year).index2

merged = pd.merge(file_(year), file1_(year), on='name') 

write/export merged_(year).dta
Haroldo Gondim
  • 7,725
  • 9
  • 43
  • 62
LB2015
  • 1
  • 2

2 Answers2

0

As much as I know there is not 'Let' keyword in Python. To iterate over multiple files in a directory you can simply use for loop with os module like the following:

import os

directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.startswith("file_200") and filename.endswith(".dat"):
        # do something
    else:
        continue

Another approach is to use regex to tell python the files names to match during the iteration. the pattern should be: pattern = r"file_20\d+"

0

It seems to me that you need to use the read_stata function, based on your .dta extensions, to read the files in a loop, create a list of the separate dataframes to be able to work with them separately, and then concatenate all dataframes into one.

Something like:

list_of_files = ['file_2000.dta', 'file_2001.dta', 'file_2002.dta']  # full paths here...

frames = []

for f in list_of_files:
    df = pd.read_stata(f)
    frames.append(df)

consolidated_df = pd.concat(frames, axis=0, ignore_index=True)

These questions might be relevant to your case:

How to Read multiple files in Python for Pandas separate dataframes

Pandas read_stata() with large .dta files

naccode
  • 510
  • 1
  • 8
  • 18