I originally posted a question about plotting different datetime-sampling in the same plot, stored in many different dataframes.
I got help understanding I needed to convert my time-column (‘ts’) to datetime. I struggled with this, still getting messed up plots. Turns out my conversion to datetime isn’t working, and this is a known thing, as stated here.
A dataframe can’t store datetime in a column (why??), it converts it back to pandas._libs.tslibs.timestamps.Timestamp.
I need to figure out the best work around this to be able to plot large datasets.
In the post above, it is stated that dataframe index can store datetime format, but when I set my column as index, and try to loop through, I get key error.
In[]: df.index.name
Out[]: ‘ts’
but when I try:
for column in df.columns[1:]:
df['ts'] = pd.to_datetime(df['ts'])
I get KeyError: 'ts'
Am I doing something wrong here? Does anyone know if datetime is stored correctly in the index?
However, I would still like to ask about the best work-around for this issue.
My bottom line is wanting to plot several dataframes correctly in the same plot. I have a lot of large datasets, and when trying out things, I am using two simplified dataframes, see below:
print(df1)
ts value
0 2019-10-18 08:13:26.702 14
1 2019-10-18 08:13:26.765 10
2 2019-10-18 08:13:26.790 5
3 2019-10-18 08:13:26.889 6
4 2019-10-18 08:13:26.901 8
5 2019-10-18 08:13:27.083 33
6 2019-10-18 08:13:27.098 21
7 2019-10-18 08:13:27.101 11
8 2019-10-18 08:13:27.129 22
9 2019-10-18 08:13:27.159 29
10 2019-10-18 08:13:27.188 7
11 2019-10-18 08:13:27.212 20
12 2019-10-18 08:13:27.228 24
13 2019-10-18 08:13:27.246 30
14 2019-10-18 08:13:27.395 34
15 2019-10-18 08:23:26.375 40
16 2019-10-18 08:23:26.527 49
17 2019-10-18 08:23:26.725 48
print(df2)
ts value
0 2019-10-18 08:23:26.375 27
1 2019-10-18 08:23:26.427 17
2 2019-10-18 08:23:26.437 4
3 2019-10-18 08:23:26.444 2
4 2019-10-18 08:23:26.527 39
5 2019-10-18 08:23:26.575 25
6 2019-10-18 08:23:26.662 6
7 2019-10-18 08:23:26.676 14
8 2019-10-18 08:23:26.718 11
9 2019-10-18 08:23:26.725 13
What is the best way to achieve the result I am looking for?
I have tried converting ‘ts’ column to both array and list, but nothing seem to bring me closer to a final working result for plotting the datasets together. Converting to datetime in array gives me numpy.datetime64, converting to datetime in list gives me pandas._libs.tslibs.timestamps.Timestamp.
Any help is highly appreciated as this is really driving me crazy.
If needed, my original 'ts' values read from avro files are of type:
'2019-10-18T08:13:27.098000'
Running:
df['ts'] = pd.to_datetime(df['ts'])
returns
'2019-10-18 08:13:27.098' (pandas._libs.tslibs.timestamps.Timestamp)
EDIT 1
Further information about my steps, this is my df after reading the avro files:
This is my df after first attempt to turn the format into datetime, returns timestamp:
This is what my df looks like after setting 'ts' as index:
I then try to turn the timestamp to datetime when it's in the index, I get keyError: