3

I have three CSV files each for a particular filename for multiple files. Let's say there are a total 20 filenames so total 20* 3csv files in three different folders.

Folder A- 1001.CSV,1002.CSV,1003.CSV...
Folder B-1001.CSV,1002.CSV,1003.CSV
Folder C-1001.csv,1002.csv,1003.csv......

I want to get a single CSV file for each 1001,1002,1003,1004..... So total 20csv files

How can I do this? Since the files are in different folders glob is not working(or I don't know how to)

Pavel Smirnov
  • 4,611
  • 3
  • 18
  • 28
mashedpoteto
  • 121
  • 8

2 Answers2

1

I made the following assumptions:

  • all the subfolders will be rooted at some known directory "parentdir"
  • each subfolder contains only relevant csv files
  • the csv files do not contain any header/footer lines
  • each record in the csv files is separated by a newline
  • all of the records in each file are relevant

This should produce a "concat.csv" file in each subfolder with the contents of all the other files in that same folder. I used a snippet of code from this other answer on stackoverflow for actually concatenating the files.

import os
import fileinput

rootdir = 'C:\\Users\\myname\\Desktop\\parentdir'
os.chdir(rootdir)
children = os.listdir()
for i in children:
    path = os.path.join(rootdir, i)
    os.chdir(path)
    filenames = os.listdir()
    with open('concat.csv', 'w') as fout, fileinput.input(filenames) as fin:
        for line in fin:
            fout.write(line + '\n')
0
import os
import shutil
import glob
import pandas as pd

path = '/mypath/'

# rename files
count = 1

for root, dirs, files in os.walk(path):
    for i in files:
        if i == 'whatever.csv':
            os.rename(os.path.join(root, i), os.path.join(root, "whatever" + str(count) + ".csv"))
            count += 1

# delete unwanted files
main_dir = path

folders = os.listdir(main_dir)

for (dirname, dirs, files) in os.walk(main_dir):
   for file in files:
      if file.startswith('dontwant'):
          source_file = os.path.join(dirname, file)
          os.remove(source_file)

# copy files to dir
for root, dirs, files in os.walk(path):  # replace the . with your starting directory
   for file in files:
       if file.endswith('.csv'):
          path_file = os.path.join(root,file)
          shutil.copy2(path_file,path) # change you destination dir

# combine files
os.chdir(path)
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')
David A
  • 141
  • 2
  • 10