I am using pandas to import data dfST = read_csv( ... , parse_dates={'timestamp':[date]})
In my csv, date is in the format YYY/MM/DD, which is all I need - there is no time. I have several data sets that I need to compare for membership. When I convert theses 'timestamp' to a string, sometimes I get something like this:
'1977-07-31T00:00:00.000000000Z'
which I understand is a datetime including milliseconds and a timezone. Is there any way to suppress the addition of the extraneous time on import? If not, I need to exclude it somehow.
dfST.timestamp[1]
Out[138]: Timestamp('1977-07-31 00:00:00')
I have tried formatting it, which seemed to work until I called the formatted values:
dfSTdate=pd.to_datetime(dfST.timestamp, format="%Y-%m-%d")
dfSTdate.head()
Out[123]:
0 1977-07-31
1 1977-07-31
Name: timestamp, dtype: datetime64[ns]
But no... when I test the value of this I also get the time:
dfSTdate[1]
Out[124]: Timestamp('1977-07-31 00:00:00')
When I convert this to an array, the time is included with the millisecond and the timezone, which really messes my comparisons up.
test97=np.array(dfSTdate)
test97[1]
Out[136]: numpy.datetime64('1977-07-30T20:00:00.000000000-0400')
How can I get rid of the time?!?
Ultimately I wish to compare membership among data sets using numpy.in1d
with date as a string ('YYYY-MM-DD') as one part of the comparison