Check if files have same name and store line count of files with same names

Question

I'm relatively new to Python and I could really use some of you guys input.

I have a script running which stores files in the following format:

201309030700__81.28.236.2.txt
201308240115__80.247.17.26.txt
201308102356__84.246.88.20.txt
201309030700__92.243.23.21.txt
201308030150__203.143.64.11.txt

Each file has has some lines of codes which I want to count total of it and then I want to store this. For example, I want to go through these files, if a file has the same date (first part of the file name) then I want to store that in the same file in the following format.

201309030700__81.28.236.2.txt has 10 lines
201309030700__92.243.23.21.txt has 8 lines

Create a file with the date 20130903 (the last 4 digits are time I don't want that). Create file: 20130903.txt Which has two lines of codes 10 8

I have the following code but I'm not getting anywhere, please help.

import os, os.path
asline = []
ipasline = []

def main():
    p = './results_1/'
    np = './new/'
    fd = os.listdir(p)
    run(fd)

def writeFile(fd, flines):
    fo = np+fd+'.txt'
    with open(fo, 'a') as f:    
        r = '%s\t %s\n' % (fd, flines)
        f.write(r)

def run(path):
    for root, dirs, files in os.walk(path):
       for cfile in files:
            stripFN = os.path.splitext(cfile)[0]
            fileDate = stripFN.split('_')[0]
            fileIP = stripFN.split('_')[-1]     
        if cfile.startswith(fileDate):
                hp = 0
                for currentFile in files.readlines()[1:]:
                    hp += 1
                    writeFile(fdate, hp)

I tried to play around with this script:

if not os.path.exists(os.path.join(p, y)):  
    os.mkdir(os.path.join(p, y))
    np = '%s%s' % (datetime.now().strftime(FORMAT), path)
if os.path.exists(os.path.join(p, m)):
    os.chdir(os.path.join(p, month, d))
    np = '%s%s' % (datetime.now().strftime(FORMAT), path)

Where FORMAT has the following value

20130903

But I can't seem to get this to work.

EDIT: I have modified the code as follows and it kinda does what I wanted to do but probably I'm doing things redundant and I still haven't taken into consideration that I'm processing huge number of files so maybe this isn't the most efficient way. Please have a look.

import re, os, os.path


p = './results_1/'
np = './new/'
fd = os.listdir(p)
star = "*"


def writeFile(fd, flines):
    fo = './new/'+fd+'_v4.txt'
    with open(fo, 'a') as f:    
    r = '%s\n' % (flines)
    f.write(r)

for f in fd:
    pathN = os.path.join(p, f)
    files = open(pathN, 'r')
    fileN = os.path.basename(pathN)
    stripFN = os.path.splitext(fileN)[0]
    fileDate = stripFN.split('_')[0]
    fdate = fileDate[0:8]
    lnum = len(files.readlines())
    writeFile(fdate, lnum)
    files.close()

At the moment it is writing to a file with new line for each number of lines counted on file. HOWEVER I have sorted this. I would appreciate some input, thank you very much.

EDIT 2: Now I'm getting the output of each file with date as file name. The files now appear as:

20130813.txt
20130819.txt
20130825.txt

Each file now looks like:

And it goes on for further 200+ lines each file. Ideally to now many times each occurrence happens and sorted with smallest number first would be the best desired outcome.

I have tried something like:

import sys
from collections import Counter

p = '.txt'
d = []
with open(p, 'r') as f:
    for x in f:
        x = int(x)
        d.append(x)
    d.sort()
    o = Counter(d)
    print o

Does this make sense?

EDIT 3:

I have the following script which count the unique for me but I'm still unable to sort by unique count.

import os
from collections import Counter

p = './newR'
fd = os.listdir(p)

for f in fd:
    pathN = os.path.join(p, f)
    with open(pathN, 'r') as infile:
        fileN = os.path.basename(pathN)
        stripFN = os.path.splitext(fileN)[0]
        fileDate = stripFN.split('_')[0]
        counts = Counter(l.strip() for l in infile)
        for line, count in counts.most_common():
            print line, count

Has this the following results:

The output should look like:

What is the most efficient way of doing this?

I'm confused about your output. Should output contain only line number totals? Or will each output file store multiple line numbers? Very confused. — xikkub, Sep 05 '13 at 17:40
I have many files, some with same date and others different. Ideally I would really like somehting where all the files with the same date to be stored in the same file. So the file with the same date will look like: 10 12 16 21 Something like that where the total number of count are sorted with smallest number first. Is this possible or too much processing? Depending I have many of these files to go through? THank you for your time. — Chino, Sep 05 '13 at 17:58
Where is `starLL` defined? You also need to indent `files.close()` so that it's in the for-loop. BTW repeatedly opening and closing your output files should probably be avoided here. My example first reads *all* the relevant data into a data structure before it opens a single file for writing. When it finally does generate output, it opens each output file exactly once. Also, make sure you `close()` the file handle in `writeFile`!! — xikkub, Sep 06 '13 at 00:03
Sorry starLL is from a def I removed which did some checking for me. I'm actually now get what I was hopping for from the script I have edited to my initial question. I'm not trying to attempt different script which will read the files and try to count each occurrence and store it in different file. — Chino, Sep 06 '13 at 14:32
I still don't understand what you're trying to accomplish. Did you fix your existing code as advised? Look at the example I provided in my answer to understand the logic behind it. — xikkub, Sep 06 '13 at 14:56
Sorry for the confusion, yes your script was very helpful and the one I have edited above also achieved similar results but probably with less effiency then yours. What I'm trying to do now is read the output of these files, count the occurance of the numbers and count, sort it then have an output such as 1 - 5, 2 - 8, 3 - 0, etc — Chino, Sep 06 '13 at 14:58
You still need to fix the indentation for `files.close()` and properly close the other file handles. `"count the occurance of the numbers and count"` is also very vague. — xikkub, Sep 06 '13 at 15:12
For example, read the lines of the file, I would like to find how many times '15' was duplicated, how many times '17' was duplicated and etc. So Ideally output like. > 1 - 3 > 2 - 10 > 3 - 5 etc Sorted — Chino, Sep 06 '13 at 15:22

score 0 · Answer 1 · edited May 23 '17 at 11:49

0

Dictionaries are very accommodating for tasks like this. You will have to modify the example below if you intend to recursively process input files at different directory depths. Also keep in mind that you can treat Python strings as lists, which allows you to splice them (this can cut down on messy regex).

D = {}
fnames = os.listdir("txt/")
for fname in fnames:
    print(fname)
    date = fname[0:8] # this extracts the first 8 characters, aka: date
    if date not in D:
        D[date] = []
    file = open("txt/" + fname, 'r')
    numlines = len(file.readlines())
    file.close()
    D[date].append(fname + " has " + str(numlines) + " lines")

for k in D:
    datelist = D[k]
    f = open('output/' + k + '.txt', 'w')
    for m in datelist:
        f.write(m + '\n')
    f.close()

edited May 23 '17 at 11:49

Community

1
1

answered Sep 05 '13 at 17:47

xikkub

1,641
1
16
28

I have many files, some with same date and others different. Ideally I would really like somehting where all the files with the same date to be stored in a single file. So the file with the same date will look like: 10'\n' 12'\n' 16'\n' 21'\n' Something like that where the total number of count are sorted with smallest number first. Is this possible or too much processing? Depending I have many of these files to go through? Thank you for your time. – Chino Sep 05 '13 at 18:03
Replace my append statement with `D[date].append(numlines)`. Also add `datelist.sort()` before you write anything to each output file, and format the `write()` line as desired. – xikkub Sep 05 '13 at 18:09
For some reason I'm getting the following error: Traceback (most recent call last): File "read_6.py", line 37, in file = open(fname, 'r') IOError: [Errno 2] No such file or directory: '201308130341__212.68.0.10.txt' – Chino Sep 05 '13 at 18:30
You may have to add more to 'fname'. Find the location of the files and make sure you `open()` the right directory (ex `open('c:\txt_dir\' + fname, 'r')`. – xikkub Sep 05 '13 at 19:04

score 0 · Answer 2 · answered Sep 06 '13 at 16:54

The following code has achieved my initial question.

import os, os.path, subprocess
from sys import stdout

p = './new/results/v4/TRACE_v4_results_ASN_mh60'
fd = os.listdir(p)

def writeFile(fd, flines):
    fo = './new/newR/'+fd+'_v4.txt'
    with open(fo, 'a') as f:    
        r = '%s\n' % (flines)
        f.write(r)

for pfiles in dirs:
pathN = os.path.join(path, pfiles)
files = open(pathN, 'r')
fileN = os.path.basename(pathN)
stripFN = os.path.splitext(fileN)[0]
fileDate = stripFN.split('_')[0]
fdate = fileDate[0:8]
numlines = len(files.readlines()[1:])
writeFile(fdate, numlines)
files.close()

It produced the following results:

20130813.txt
20130819.txt
20130825.txt

My sincerely apology if I have not followed the rules.

Check if files have same name and store line count of files with same names

2 Answers2