0

I have thousands of files inside a directory with this pattern YYYY/MM/DD/HH/MM:

  • 201801010000.txt
  • 201801010001.txt
  • 201801010002.txt

I want to keep just the hours, so I need to merge 60 files into one for every hour of every day. I don't know how to search into the filename to get the 60 files that i want. This is what I wrote

def concat_files(path):
    file_list = os.listdir(path)
    with open(datetime.datetime.now(), "w") as outfile:
        for filename in sorted(file_list):
            with open(filename, "r") as infile:
                outfile.write(infile.read())

How do I name the file to keep the date? I'm using datetime now but it override the current filename. With my code I'm merging all files into one, I should merge every % 60 into a different file.

Mik
  • 13
  • 1
  • 7
  • Possible duplicate of [Merge CSV Files in Python with Different file names](https://stackoverflow.com/questions/20684640/merge-csv-files-in-python-with-different-file-names) – Watty62 May 16 '18 at 12:10
  • If the filename is already in the format YYMMDDHHMM , can't you just remove the last two characters before the `.txt` extension ? – ChatterOne May 16 '18 at 12:11
  • IMO a combination of `groupby` and `datetime.strptime` will solve this easily. Can you elaborate on input and output? – Reut Sharabani May 16 '18 at 12:11

3 Answers3

1

You can use glob to get just files you want. It lets you pass in a pattern to match against when searching for files. In the last line below, it will only find files that begin with '2018010100', have two characters, and end with '.txt'

from glob import glob

def concat_files(dir_path, file_pattern):
    file_list = glob(os.path.join(dir_path, file_pattern))
    with open(datetime.datetime.now(), "w") as outfile:
        for filename in sorted(file_list):
            with open(filename, "r") as infile:
                outfile.write(infile.read())

concat_files('C:/path/to/directory', '2018010100??.txt')
James
  • 32,991
  • 4
  • 47
  • 70
1

You were not that far, you just need to swap your logic:

file_list = os.listdir(path)
for filename in sorted(file_list):
    out_filename = filename[:-6] + '.txt'
    with open(out_filename, 'a') as outfile:
        with open(path + '/' + filename, 'r') as infile:
            outfile.write(infile.read())
ChatterOne
  • 3,381
  • 1
  • 18
  • 24
0

Try this one.

file_list = os.listdir(path)
for f in { f[:-6] for f in file_list }:
    if not f:
        continue
    with open(f + '.txt', 'a') as outfile:
        for file in sorted([ s for s in file_list if s.startswith(f)]):
            with open(path + '/' + file, 'r') as infile:
                outfile.write(infile.read())
            #os.remove(path + '/' + file) # optional
  • Welcome to Stack Overflow! While it's great to answer questions and we welcome it, it is also necessary to explain what did your code do as a solution. Add the relevant explanation to your answer. [From Review](https://stackoverflow.com/review/first-posts/19750124) – Abhi May 16 '18 at 22:16