1

I have a SQL table in this format, which I would like to read and convert into a pandas timeseries.

y (year)  w (week)   d (some data)
2009      1          10
2009      2          15
...

What is a good way to do this?

I am aware of read_sql()'s parse_dates argument, and alternatively, setting the index manually using DatetimeIndex. I cannot understand how to do this with week data. I have tried the following. Thanks.

# gives NaT for year & week:
df = pd.read_sql("SELECT y, w, d FROM t",
                 db, parse_dates={"y":"%Y", "w":"%U"})

# gives wrong dates for yw - e.g. all 2009-01-01:
df = pd.read_sql("SELECT CONCAT(y,'/',w) as yw, d FROM t",
                 db, parse_dates={"yw": "%Y/%U"})

# throws DateParseError exception:
df = pd.read_sql("SELECT CONCAT(y,'W',w) as yw, d FROM t",
                 db)
df.index = pd.DatetimeIndex(df.yw)
pythonhunter
  • 1,047
  • 3
  • 10
  • 23
Hadi
  • 203
  • 3
  • 9
  • Perhaps use SQL to [convert the year and week into a date](http://stackoverflow.com/q/7078730/190597): `SELECT DATE_ADD(MAKEDATE(y, 1), INTERVAL w WEEK), d FROM t`. – unutbu Aug 23 '14 at 12:07
  • @unutbu, thank you for the suggestion, which should work. I am however assuming there is a more elegant, Pythonic way to do this. – Hadi Aug 23 '14 at 12:18

1 Answers1

0

Really, I think the most elegant way is to perform the conversion in SQL:

sql = "SELECT DATE_ADD(MAKEDATE(y, 1), INTERVAL w WEEK) as date, d FROM test.t"
df = pd.read_sql(sql, engine)
print(df)

yields

         date   d
0  2009-01-08  10
1  2009-01-15  15

To do the equivalent in Python requires more boiler-plate since AFAIK Pandas does not provide any out-of-the-box facilities for converting years and week numbers into dates. You could of course use loops and the datetime module to convert the numbers into datetime.datetime objects one-by-one.

A faster way would be to use NumPy's datetime64 and timedelta64 dtypes to do the date arithmetic as NumPy arrays:

sql = "SELECT y, w, d FROM t"
df = pd.read_sql(sql, engine)

date = (df['y'].astype('<i8')-1970).view('<M8[Y]')
delta = (df['w'].astype('<i8')*7).view('<m8[D]')
df['date'] = date+delta
df = df[['date', 'd']]
print(df)

yields

        date   d
0 2009-01-08  10
1 2009-01-15  15

Edit: Building off of Hadi's answer in the comment, it's also possible to compute the date in SQL using MySQL's STR_TO_DATE function:

sql = "SELECT STR_TO_DATE(CONCAT(y,':',w,':1'), '%x:%v:%w') as date, d FROM test.t"    
df = pd.read_sql(sql, engine)

which yields

         date   d
0  2008-12-29  10
1  2009-01-05  15

Note that according the the MySQL docs, %v, means

Week (01..53), where Monday is the first day of the week; WEEK() mode 3; used with %x

and later on the same page, mode 3 means

Mode    First day of week   Range   Week 1 is the first week …
3       Monday              1-53    with 4 or more days this year

For mode values with a meaning of “with 4 or more days this year,” weeks are numbered according to ISO 8601:1988.

So if you want the week number to be consistent with ISO 8601:1988, then with STR_TO_DATE you should use %v (or %u), not %V (or %U).

Note that my first answer above does not interpret the week as an ISO 8601 week number; it merely computes 7 days from Jan 1 for each week.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 1
    I figured out this also works: `pd.read_sql("SELECT CONCAT(y,':',w,':1') AS dt, d FROM t", db, parse_dates = {"dt": "%Y:%U:%w"})` . This way the parsing happens on the Python side. It still does CONCAT in SQL, so I'm not sure if it's better or not than what you suggest. – Hadi Aug 23 '14 at 17:39
  • Oh, that's clever! Note that your method gives a different result than mine -- since `%Y:%U:%w` is parsed to mean the `nth` week of the year beginning on Monday, while my method computes `n*7` days from Jan 1. – unutbu Aug 23 '14 at 18:33
  • I'm accepting your answer for now, but perhaps a better answer is alter on found :) – Hadi Aug 26 '14 at 14:01
  • Hadi, it's okay to post and accept your solution -- yours is consistent with the ISO-8601 definition of week number, while mine is not. – unutbu Aug 26 '14 at 16:11
  • 1
    Correction: To parse the week number as a ISO-8601:1988 week number in Python, you may need to use `%V` rather than `%U`. To parse the same in MySQL, you'd need to use `%v`. – unutbu Aug 26 '14 at 17:14