0

I have different CSV files in a folder.

Eg:

Master_data_0112207518.csv       3MB
Master_data_0112272018.csv       2MB
Master_data_0112232018.csv       7MB
Master_data_Loop_0110452018.csv  5MB
Master_data_Loop_0110222018.csv  7MB
Master_data_Loop_0110372018.csv  6MB

I have to write a python code to select same beginning name of files and create a merged file of this data group in one csv file.

output:

Total Number of different files : 2

    ['Master_data.csv', 'Master_data_Loop.csv']

After combination:

Master_data.csv       12MB
Master_data_Loop.csv  18MB
PV8
  • 5,799
  • 7
  • 43
  • 87
bhavesh
  • 97
  • 6

1 Answers1

0

Ok so your first step is to get list of filenames, group by their patterns. If all your files always ending with numbers following extension you can use

import glob
import re
from itertools import groupby

all_csv = glob.glob('*.csv')
all_csv.sort()

split_csv = [list(i) for j, i in groupby(all_csv, lambda a: re.split(r'\d*.csv$', a)[0])]

The regular expression is use to split the filename removing the ending numbers and extension so there only left the part of the filename that is common to several CSV. It allows us to use the goupby function of itertools. The split_csv should looks like:

[['Master_data_0112207518.csv', 'Master_data_0112232018.csv', 'Master_data_0112272018.csv'], ['Master_data_Loop_0110222018.csv', 'Master_data_Loop_0110372018.csv', 'Master_data_Loop_0110452018.csv']]

After that you have two solutions to merge the csv files. The easiest one use the pandas library:

import pandas

for patern in split_csv:
    combined_csv = pandas.concat([pandas.read_csv(f) for f in patern])
    combined_csv.to_csv('{}_combined.csv'.format(re.split(r'\d*.csv$', patern[0])[0]), index=False)

The other one is less adaptative, especially if you don't have the same columns or not in the same order, but doesn't require any other library:

for patern in split_csv:
    ficout = open('{}_combined.csv'.format(re.split(r'\d*.csv$', patern[0])[0]), 'a')
    for ficin in patern:
        f = open(ficin)
        f.__next__() # It's use to skip the header line...
        for line in f:
            ficout.write(line)
        f.close()
    ficout.close()