1

I have many signals being logged based on change. When retrieving the data by reading from avro, I read data into a list of lists and then create a dataframe.

I use ‘groupby’ to get data for different signals and would like to plot the different signals in the same plot. The different signals have a different number of entries, and this is causing me great trouble. I have created a simplified example to work with when trying to solve this.

print(df1)
                             ts  value
0  2019-10-18T08:13:26.790000      6
1  2019-10-18T08:13:26.889000      7
2  2019-10-18T08:13:26.901000     10
3  2019-10-18T08:13:27.098000      1
4  2019-10-18T08:13:27.188000      8
5  2019-10-18T08:23:26.527000     13
6  2019-10-18T08:23:26.725000     12

print(df2)
                           ts  value
0  2019-10-18T08:23:26.375000   12.0
1  2019-10-18T08:23:26.527000    7.0
2  2019-10-18T08:23:26.575000    8.0
3  2019-10-18T08:23:26.725000    6.0

I go:

ax=plt.gca()
df1.plot(ax=ax, x='ts', y='value', c='xkcd:burgundy', legend=True)
df2.plot(ax=ax, x='ts', y='value', c='xkcd:baby blue', legend=True)
plt.gcf().autofmt_xdate()   

Result:

enter image description here

I don't know if my issue is with the datetime object (original data is from azure), or if the issue is that the number of entries is not the same.

As seen, the equal timestamps are not plotted correctly. It seem the plot is taking entry by entry? I am also wondering why the datetime on x-axis is not shown for the last entries of df1?

I then try something different:

ax=plt.gca()
ax2=ax.twiny()
df1.plot(ax=ax, x='ts', y='value', c='xkcd:burgundy')
df2.plot(ax=ax2, x='ts', y='value', c='xkcd:baby blue', 
secondary_y=True)
df2.plot(ax=ax, x='ts', y='value', c='xkcd:mustard')
plt.gcf().autofmt_xdate()
plt.show()

Result:

enter image description here

Here I have tried to plot df2 on a secondary axis, and on the same axis to see what happens. Maybe I don’t understand what I am actually asking for here, but the result is not what I want.

I tried twinx() as I thought this was more logical, but no luck then either. Twiny() is bringing me closer, sort of (and I don't know why). But I want all my signals to have the same x- axis, and be plotted correctly in relation to each other, no matter how many entries they have. What to do?

In my real task I would need to plot several signals with the same y-axis, and some signals with a secondary y-axis as the signals values are on different levels so to speak. So a solution that works with both is very welcome.

I need to see my signals in the same plot to see the interaction, and then I need a common x-axis to be correct. What can I do?

Is there some overall smarter way to do this? '

EDIT 1 - after comments from Parfait

I read data from avro files, my original df lookes like this:

enter image description here

After just sorting the signals with group by, I got trouble when I started plotting. That is my original topic. I therefore create some simple dataframes with some of the ‘ts’ data, with corresponding simple values, to be able to manually see if the plots are represented correctly when plotted in the same plot.

After comments from Parfait, I loop through and try to change ‘ts’ to datetime:

enter image description here

It does not return datetime, but timestamp. I started reading about this, and it seems others are having the same problem. Datetime is altered to timestamp in a df column.

I create some simple dfs to help me find a solution without working with to big datasets:

enter image description here

I then try to convert once more, timestamp is still the output. And as seen below, the plot is wrong.

enter image description here

This is driving me crazy. After reading a lot, I found this post, and it seems this is a known problem. I then posted a new question requesting a work around, hoping 'ts' as index will solve the problem and keep the datetime format.

But Parfait, you state this is working for you. Is my problem clear to you now? What is your solution?

Thanks for all help! Anything bringing me closer to solving this is very helpful.

Miss.Pepper
  • 103
  • 10
  • Your `ts` columns are not `datetime` types as seen with the `T`. Try converting with [`pandas.to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html) before plotting. – Parfait Sep 25 '20 at 15:02
  • Thanks! I can't seem to make it all the way to the top. I try: pd.to_datetime(df['ts'], format='%Y-%m-%d %H:%M:%S', errors='coerce') but this gives me wrong plot as I have 100ms resolution. Tried 'ms', 'MS', 'ns','NS' in addition, but I either get error, or just NaT in my column. How to get correct interpretation of my format? – Miss.Pepper Oct 09 '20 at 19:09
  • You need to account for decimal seconds but passing no `format` should cleanly convert date/time with posted data: `pd.to_datetime(df['ts'])`. Otherwise to account for [microseconds](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) and the `T`: `pd.to_datetime(df['ts'], format='%Y-%m-%dT%H:%M:%S.%f')`. – Parfait Oct 09 '20 at 19:45
  • Hm, now I get my 'ts' column like 2019-10-18 08:13:26.702, that looks better, but df1.plot(x='ts', y='value') still returns wrong plot. Now, x-axis have values like '18 08:14, 18 08:15 ...., 18 08:23 So, seems it's not getting the format correctly? Thanks for all help! :) – Miss.Pepper Oct 16 '20 at 08:14
  • Above comment is the samme for both pd.to_datetime(df['ts']) and pd.to_datetime(df['ts'], format='%Y-%m-%dT%H:%M:%S.%f', errors='coerce') – Miss.Pepper Oct 16 '20 at 08:18
  • I now found that pd.to_datetime() still returns pandas._libs.tslibs.timestampa.Timestamp and it seems from other posts that converting a column in a dataframe from timestamp do not work? I can not make it work. Any tips? – Miss.Pepper Oct 16 '20 at 09:01
  • I am confused about your actual problem. Not clear about *wrong plot*. Feel free to [edit](https://stackoverflow.com/posts/64065968/edit) your post. Check `df.dtypes` to see if `ts` is not `datetime64[ns]` after conversion? Posted data works without issue on my end using `pd.to_datetime`. Maybe your fuller data has issues? – Parfait Oct 16 '20 at 17:04
  • Thank you for looking into this, I now added an edit at the bottom of my original question. I hope this can help you see what my issue is. If this works on your end I would really appreciate help to figure out why my code isn’t working the same way. – Miss.Pepper Oct 16 '20 at 19:07
  • did you consider merging the individual dfs before plotting? That might give you a consistent x-axis. – FObersteiner Oct 16 '20 at 19:46
  • You are showing the type at one value using `.at` not entire series. As commented, see all column types with: `df1.dtypes`. Also, your loop is redundant and you do not use the loop variable, `column`. Therefore, I see no problem except date formatting of x-axis. – Parfait Oct 16 '20 at 21:14
  • maybe merging could help, but right now, plotting a single df does not work having 'ts' as datetime64[ns]. (As seen in the picture in the bottom of the post) I don't understand why, or how to solve this. It does not matter if 'ts' is index or a column, the plot is still wrong. – Miss.Pepper Oct 21 '20 at 08:12

0 Answers0