This regex is supposed to find a string that finds something in this format exactly:
201308 - (82608) - MAC 2233-007-Methods of Calculus - Lastname, Lee.txt
The only caveat is the last part between the last hyphen and the .txt, and the course name right before that, can both be a variable number of letters (the instructor name and course name). All else has that number of characters in that format (either int numbers separated exactly by that many spaces and hyphens or that exact course prefix with all cap letters).
What the regex is actually doing is finding nothing at all. Without trying to escape the parentheses it was catching some files, but now nada. I'm using re.search
instead of re.match
because obviously the regex is not finished and I'm testing pieces of it.
import re, os, sys, shutil
def readDir(path1):
return [ f for f in os.listdir(path1) if os.path.isfile(os.path.join(path1,f)) ]
def files(dir1,term,path1):
match2 = []; stillWrong = []#; term = str(term)
for f in dir1:
result = re.search(term + "\s\b\s\(\d{5}\)\s\b\s\w{3}\s\d{4}\b\d{3}[a-z\A-Z]+\s\b\s[A-z\a-z]+\b\s[A-Z\a-z]+ .txt",f)
if result: match2.append(f)
else: stillWrong.append(f)
#print "split --- ",os.path.split(f)
##else: os.rename(path1+'\\'+f, path1+'\\'+'@ '+f); stillWrong.append(f)
print "f ---- ",f
return match2, stillWrong
term = "201308"; src = "testdir1"; dest = "testdir2"
print files(readDir(dest),term,dest)
This produces the (obviously) wrong:
>>>
f ---- @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ---- @ @ @ @ @ @ 201308 abc 123.txt
f ---- @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ---- @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ---- @ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
f ---- @ @ @ @ @ @ @ @ @ 201308 abc 123.txt
f ---- @ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
f ---- @ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
f ---- @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
f ---- @ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
([], ['@ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt', '@ @ @ @ @ @ @ @ @ 201308 abc 123.txt', '@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt', '@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt', '@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt'])
>>>
As you can see there's nothing in match2[]
list (if you're interested, those are the filenames in the 2nd list, but the 1st list holds the relevant matches). I'm teaching myself Python and regex, and it's not going well. I've tried these (and regex tutorials) but didn't seem helpful in this case:
Escaping regex string in Python
Regex to escape the parentheses
How to implement \p{L} in python regex
All of the @
are from the os.rename
that you see commented out, but it didn't work before that was commented anyhow. I'm sure any entry-level programmer could top this off in a few minutes, but if a pro happens on this question and would spare a minute, that's great too.
EDIT: List of filenames used (production list is much longer obviously):
201308-(12345) - Abc 2233-007-course Name - last, first.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ 201308-(12345) - Abc 2233-007-course Name - last, first.txt
@ @ @ @ @ @ @ @ @ 201308 abc 123.txt
@ @ @ @ @ @ @ @ @ 201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
@ @ @ @ @ @ @ @ @ @ 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ 45-12 - xyz - mno - 123-pqr-tuv-456.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 123 abc - a-1 - b-2.txt
@ @ @ @ @ @ @ @ @ @ @ xxxxx xxxxx xxxxx 45-12 - xyz - mno - 123-pqr-tuv-456.txt
45-12 - xyz - mno - 123-pqr-tuv-456.txt
123 abc - a-1 - b-2.txt
201308 - (82608) - MAC 2233-007-Methods of Calculus - Klingler, Lee.txt
201308 abc 123.txt
201308-(12345) - Abc 2233-007-course Name - last, first.txt