0

In the documentation for csvkit, one of its principles is that "When modifying input data, conform to good standards. Floats should end with ”.0”, even if they are round, dates and times should be in ISO8601 format, etc."

Is there an existing way to easily change something in the format of mm/dd/yyyy to yyyy-mm-dd using csvkit, or will I need manually correct the file first?

NumenorForLife
  • 1,736
  • 8
  • 27
  • 55

1 Answers1

1

I am not sure what you need answered regarding the ISO8601 format.

In regards to the date format conversion, you could use a regular expression on the date string to convert it to the proper format.

def change_to_yyyymmdd(str_mmddyyyy):
    import re
    try:
        m,d,y = re.findall(r"^(\d{2}).(\d{2}).(\d{4})",str_mmddyyyy)[0]
    except IndexError:
        raise TypeError("Date not in mm/dd/yyyy format.")
    return "-".join([y,m,d])
Jeff Mandell
  • 863
  • 7
  • 16
  • I was hoping to learn that someone had a solution to something I suspect is a larger problem. Did you mean re.findall("...).. (without the r)? I am hesitant to have to rewrite several million rows, but it seems like it's the only answer – NumenorForLife Apr 30 '15 at 04:36
  • 1
    The r in front of the string declares that the string is a raw string. If you do not like the regular expression or try except you can re-format the date by unpacking the values: `m,d,y = str_mmddyyyy.split("/")` – Jeff Mandell Apr 30 '15 at 05:01
  • How would you compare your solution to using dateutil used here http://stackoverflow.com/questions/4460698/python-convert-date-to-iso-8601 – NumenorForLife Apr 30 '15 at 05:02
  • My solution manually formats mm/dd/yyyy to yyyy-mm-dd. The solution on that link formats Thu, 16 Dec 2010 12:14:05 +0000 to 2010-12-16T12:14:05+00:00 using a the dateutil parser. – Jeff Mandell Apr 30 '15 at 05:05