Try the standard library:
import datetime
parser = lambda t: datetime.datetime.strptime(str(t), "%Y%m%d")
However, I don't really know if this is much faster than pandas.
Since your format is so simple, what about
def parse(t):
string_ = str(t)
return datetime.date(int(string_[:4]), int(string[4:6]), int(string[6:]))
EDIT you say you need to take care of invalid data.
def parse(t):
string_ = str(t)
try:
return datetime.date(int(string_[:4]), int(string[4:6]), int(string[6:]))
except:
return default_datetime #you should define that somewhere else
All in all, I'm a bit conflicted about the validity of your problem:
- you need to be fast, but still you get your data from a CSV
- you need to be fast, but still need to deal with invalid data
That's kind of contradicting; my personal approach here would be assuming that your "huge" CSV just needs to be brought into a better-performing format once, and you either shouldn't care about speed of that conversion process (because it only happens once) or you should probably bring whatever produces the CSV to give you better data--there's so many formats that don't rely on string parsing.