I have data in below format, its has 3 columns website name(yaxis), access_date(x axis) and user count(data points). I want to visualize this data wrt date (time) in incremental way, from left to right for each website name. I have this data for 2 years i.e. 730 days.
What is the best way to visualize this data to see trend? I just started with matplotlib and most of the example that I see is of 2dimension like this.
website access_date count
YouTube 20210912 1492554
Pluto 20211021 63024
Prime 20210927 493621
Spectrum 20220213 472823
CBS 20210619 100250
HBO 20200419 166974
discovery 20200919 3765
Prime 20220128 6
Netflix 20200215 4422443
Netflix 20200523 5565209
Update: While trying the below approach provided by @jezrael I am getting below error.
df.show()
df_pd = df.toPandas()
print (df_pd.columns.tolist())
df_pd['date'] = pd.to_datetime(df_pd['date'],format='%Y%m%d')
print(df_pd)
df_pd = df_pd.pivot_table(index='date',
columns='website',
values='count',
aggfunc='sum',
fill_value=0).cumsum()
Output:
+-----------------+-----------+----------+
|website |date |count |
+-----------------+-----------+----------+
| Netflix| 20200827| 5343644|
| YouTube| 20200205| 1284673|
| Netflix| 20201219| 6344211|
| Disney| 20210512| 959738|
| YouTube| 20200829| 1629708|
| VUDU| 20200614| 102937|
['website', 'date', 'count']
website date count
0 Netflix 2020-08-27 5343644
1 YouTube 2020-02-05 1284673
2 Netflix 2020-12-19 6344211
3 Disney 2021-05-12 959738
4 YouTube 2020-08-29 1629708
5 VUDU 2020-06-14 102937
ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units