I'm relatively new to Python and I could really use some of you guys input.
I have a script running which stores files in the following format:
201309030700__81.28.236.2.txt
201308240115__80.247.17.26.txt
201308102356__84.246.88.20.txt
201309030700__92.243.23.21.txt
201308030150__203.143.64.11.txt
Each file has has some lines of codes which I want to count total of it and then I want to store this. For example, I want to go through these files, if a file has the same date (first part of the file name) then I want to store that in the same file in the following format.
201309030700__81.28.236.2.txt has 10 lines
201309030700__92.243.23.21.txt has 8 lines
Create a file with the date 20130903 (the last 4 digits are time I don't want that). Create file: 20130903.txt Which has two lines of codes 10 8
I have the following code but I'm not getting anywhere, please help.
import os, os.path
asline = []
ipasline = []
def main():
p = './results_1/'
np = './new/'
fd = os.listdir(p)
run(fd)
def writeFile(fd, flines):
fo = np+fd+'.txt'
with open(fo, 'a') as f:
r = '%s\t %s\n' % (fd, flines)
f.write(r)
def run(path):
for root, dirs, files in os.walk(path):
for cfile in files:
stripFN = os.path.splitext(cfile)[0]
fileDate = stripFN.split('_')[0]
fileIP = stripFN.split('_')[-1]
if cfile.startswith(fileDate):
hp = 0
for currentFile in files.readlines()[1:]:
hp += 1
writeFile(fdate, hp)
I tried to play around with this script:
if not os.path.exists(os.path.join(p, y)):
os.mkdir(os.path.join(p, y))
np = '%s%s' % (datetime.now().strftime(FORMAT), path)
if os.path.exists(os.path.join(p, m)):
os.chdir(os.path.join(p, month, d))
np = '%s%s' % (datetime.now().strftime(FORMAT), path)
Where FORMAT has the following value
20130903
But I can't seem to get this to work.
EDIT: I have modified the code as follows and it kinda does what I wanted to do but probably I'm doing things redundant and I still haven't taken into consideration that I'm processing huge number of files so maybe this isn't the most efficient way. Please have a look.
import re, os, os.path
p = './results_1/'
np = './new/'
fd = os.listdir(p)
star = "*"
def writeFile(fd, flines):
fo = './new/'+fd+'_v4.txt'
with open(fo, 'a') as f:
r = '%s\n' % (flines)
f.write(r)
for f in fd:
pathN = os.path.join(p, f)
files = open(pathN, 'r')
fileN = os.path.basename(pathN)
stripFN = os.path.splitext(fileN)[0]
fileDate = stripFN.split('_')[0]
fdate = fileDate[0:8]
lnum = len(files.readlines())
writeFile(fdate, lnum)
files.close()
At the moment it is writing to a file with new line for each number of lines counted on file. HOWEVER I have sorted this. I would appreciate some input, thank you very much.
EDIT 2: Now I'm getting the output of each file with date as file name. The files now appear as:
20130813.txt
20130819.txt
20130825.txt
Each file now looks like:
15
17
18
21
14
18
14
13
17
11
11
18
15
15
12
17
9
10
12
17
14
17
13
And it goes on for further 200+ lines each file. Ideally to now many times each occurrence happens and sorted with smallest number first would be the best desired outcome.
I have tried something like:
import sys
from collections import Counter
p = '.txt'
d = []
with open(p, 'r') as f:
for x in f:
x = int(x)
d.append(x)
d.sort()
o = Counter(d)
print o
Does this make sense?
EDIT 3:
I have the following script which count the unique for me but I'm still unable to sort by unique count.
import os
from collections import Counter
p = './newR'
fd = os.listdir(p)
for f in fd:
pathN = os.path.join(p, f)
with open(pathN, 'r') as infile:
fileN = os.path.basename(pathN)
stripFN = os.path.splitext(fileN)[0]
fileDate = stripFN.split('_')[0]
counts = Counter(l.strip() for l in infile)
for line, count in counts.most_common():
print line, count
Has this the following results:
14 291
15 254
12 232
13 226
17 212
16 145
18 127
11 102
10 87
19 64
21 33
20 24
22 15
9 15
23 9
30 6
60 3
55 3
25 3
The output should look like:
9 15
10 87
11 102
12 232
13 226
14 291
etc
What is the most efficient way of doing this?