0

I have data in below format, its has 3 columns website name(yaxis), access_date(x axis) and user count(data points). I want to visualize this data wrt date (time) in incremental way, from left to right for each website name. I have this data for 2 years i.e. 730 days.

What is the best way to visualize this data to see trend? I just started with matplotlib and most of the example that I see is of 2dimension like this.

website   access_date count
YouTube 20210912    1492554
Pluto   20211021    63024
Prime   20210927    493621
Spectrum    20220213    472823
CBS 20210619    100250
HBO 20200419    166974
discovery   20200919    3765
Prime  20220128    6
Netflix 20200215    4422443
Netflix 20200523    5565209

Update: While trying the below approach provided by @jezrael I am getting below error.

df.show()
df_pd = df.toPandas()
print (df_pd.columns.tolist())
df_pd['date'] = pd.to_datetime(df_pd['date'],format='%Y%m%d')
print(df_pd)
df_pd = df_pd.pivot_table(index='date', 
                    columns='website', 
                    values='count', 
                    aggfunc='sum', 
                    fill_value=0).cumsum()

Output:


+-----------------+-----------+----------+
|website          |date       |count     |
+-----------------+-----------+----------+
|          Netflix|   20200827|   5343644|
|          YouTube|   20200205|   1284673|
|          Netflix|   20201219|   6344211|
|           Disney|   20210512|    959738|
|          YouTube|   20200829|   1629708|
|             VUDU|   20200614|    102937|

['website', 'date', 'count']
                website   date          count
0               Netflix  2020-08-27     5343644
1               YouTube  2020-02-05     1284673
2               Netflix  2020-12-19     6344211
3                Disney  2021-05-12      959738
4               YouTube  2020-08-29     1629708
5                  VUDU  2020-06-14      102937

ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units

Explorer
  • 1,491
  • 4
  • 26
  • 67

1 Answers1

2

I think you can use DataFrame.pivot_table with cumulative sum by DataFrame.cumsum if incremental way is add sums from previous datetimes:

df['access_date'] = pd.to_datetime(df['access_date'], format='%Y%m%d')

df = df.pivot_table(index='access_date', 
                    columns='website', 
                    values='count', 
                    aggfunc='sum', 
                    fill_value=0).cumsum()


df.plot()

If need only show values remove in solution above cumsum.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I am getting below error, I have updated the post with the my code too: ` ValueError: view limit minimum 0.0 is less than 1 and is an invalid Matplotlib date value. This often happens if you pass a non-datetime value to an axis that has datetime units ` – Explorer Apr 01 '22 at 19:07