Yesterday, I learned from Kirk Strauser that strptime()) is much more slower than other solutions: see this file
So my advice is to use another way. For exemple:
import re
ss = '''February 27, 1820
a line
April 3, 1885'''
regx = re.compile('(January|February|March|'
'April|May|June'
'July|August|September|'
'October|November|December)'
' '
'(\d|[012]\d|3[01])'
',(?= \d{4})')
print regx.findall(ss)
print
print regx.sub('\\2 \\1',ss)
Edit 1
The speed of the program can be improved using regx.sub(repl,ss) with repl() being a function that doesn't extract the month and day as group(1) and group(2), but by slicing:
import re
from time import clock
ss = '''February 27, 1820
a line
April 3, 1885'''
regx = re.compile('(January|February|March|'
'April|May|June'
'July|August|September|'
'October|November|December)'
' '
'(\d|[012]\d|3[01])'
',(?= \d{4})')
print regx.findall(ss)
print
te = clock()
for i in xrange(10000):
x = regx.sub('\\2 \\1',ss)
print clock()-te
print x
print
regx = re.compile('(?:January|February|March|'
'April|May|June'
'July|August|September|'
'October|November|December)'
' '
'(?:\d|[012]\d|3[01]),'
'(?= \d{4})')
def repl(mat):
sp = mat.group().split()
return sp[1][0:-1] + ' ' + sp[0]
te = clock()
for i in xrange(1):
y = regx.sub(repl,ss)
print clock()-te
print y
result
[('February', '27'), ('April', '3')]
2.52965614345
27 February 1820
a line
3 April 1885
0.378833622709
27 February 1820
a line
3 April 1885
PS: I also knew that there is a problem of span of time covered by strftime and strptime (not before 1900) , that's why I immediately choosed to treat the problem with a regex. People find regexes too heavy and impressive to resort to them, but I don't understand this trend, because as soon as you master just a little the regexes, you can do plenty of things, with efficiency and speed. Hura for the regex tool.