I found that we can create date-time columns in a Pandas DataFrame by doing this:
>>> dt1 = pandas.DatetimeIndex(["2016-03-04 15:01:49",
"2016-03-05 23:54:22",
"2016-04-03 21:22:08",
"2016-04-03 21:22:08",
"2016-03-05 23:54:22"])
>>> df1 = pandas.DataFrame([["firefly", 37],
["wood", 47],
["snowflake", 12],
["waterfall", 67],
["wind", 208]],
columns = ["what", "count"])
>>> df1['when_last'] = dt1
df1
what count when_last
0 firefly 37 2016-03-04 15:01:49
1 wood 47 2016-03-05 23:54:22
2 snowflake 12 2016-04-03 21:22:08
3 waterfall 67 2016-04-03 21:22:08
4 wind 208 2016-03-05 23:54:22
This is my question:
Is this a legal construct? Part of my confusion is this: is DatetimeIndex
supposed to be able to accomodate duplicate dates and unordered dates, when we don't make that an index?
This is my use case that precipitates the experiment above: I have a table that I want to process using Pandas, that has many (but not too many) fields, about 40s of them. The table itself contain tens of thousands of records or more. The original format of this table is text CSV. The processing will be basically along the line of SQL-like analytics (filter, join, sort, etc), for which Pandas have decent capabilities. Among these fields there are several date-time fields (stored as UNIX timestamps in the CSV file), three or four of them. None of these can be good to use as an index of the Dataframe rows; they are dates related to several events belonging to a record, and they can have duplicates, since events can be stamped with exactly the same date-time values.
Several stackoverflow users have suggested that directly parsing date-time with read_csv
with date_parser
argument is actually quite poor (and perhaps performance is also mediocre) if we parse the date one-by-one, like this one. Given that the raw columns contain simply UNIX timestamps, we should be able to get high performance. The other problem is that to_datetime
does not support timezone to ascribe to the UNIX timestamps. The example above doesn't have timezone, but I want to include it in my real case.