4

I would like to visualize CSV data as shown below, by a timeseries representation, using python's pandas module (see links below).

Sample data of df1:

             TIMESTAMP  eventid
0  2017-03-20 02:38:24        1
1  2017-03-21 05:59:41        1
2  2017-03-23 12:59:58        1
3  2017-03-24 01:00:07        1
4  2017-03-27 03:00:13        1

The 'eventid' column always contains the value of 1, and I am trying to show the sum of events for each day in the dataset. Is

pandas.Series.cumsum() 

the correct function to use for this purpose?

script so far:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df1 = pd.read_csv('timestamp01.csv')
print df1.columns # u'TIMESTAMP', u'eventid'

# I: ts = pd.Series(df1['eventid'], index=df1['TIMESTAMP']) 
# O: Blank plot

# I: ts = pd.Series(df1['eventid'], index=pd.date_range(df1['TIMESTAMP'], periods=1000)) 
# O: TypeError: Cannot convert input ... Name: TIMESTAMP, dtype: object] of type <class 'pandas.core.series.Series'> to Timestamp

# working test example:
# I: ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
# O: See first link below (first plot).

ts = ts.cumsum()
ts.plot()
plt.show()

Links I have tried to follow:

http://pandas.pydata.org/pandas-docs/stable/visualization.html

Aggregating timeseries from sensors

(above example has different values, as opposed to my 'eventid' data)

d3: timeseries from data

Any help is much appreciated.

Community
  • 1
  • 1
Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53

2 Answers2

3

It seems you need convert TIMESTAMP column to datetime by parameter parse_dates in read_csv:

import pandas as pd
from pandas.compat import StringIO

temp=u"""TIMESTAMP,eventid
2017-03-20 02:38:24,1
2017-03-20 05:38:24,1
2017-03-21 05:59:41,1
2017-03-23 12:59:58,1
2017-03-24 01:00:07,1
2017-03-27 03:00:13,1"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),  parse_dates=True, index_col='TIMESTAMP')
print (df)
                     eventid
TIMESTAMP                   
2017-03-20 02:38:24        1
2017-03-20 05:38:24        1
2017-03-21 05:59:41        1
2017-03-23 12:59:58        1
2017-03-24 01:00:07        1
2017-03-27 03:00:13        1

print (df.index)
DatetimeIndex(['2017-03-20 02:38:24', '2017-03-20 05:38:24',
               '2017-03-21 05:59:41', '2017-03-23 12:59:58',
               '2017-03-24 01:00:07', '2017-03-27 03:00:13'],
              dtype='datetime64[ns]', name='TIMESTAMP', freq=None)

Then use resample by days and get counts by size function. Last Series.plot:

print (df.resample('D').size())
TIMESTAMP
2017-03-20    2
2017-03-21    1
2017-03-22    0
2017-03-23    1
2017-03-24    1
2017-03-25    0
2017-03-26    0
2017-03-27    1
Freq: D, dtype: int64

df.resample('D').size().plot()

If need change format of tickers:

import matplotlib.ticker as ticker

ax = df.resample('D').size().plot()
ax.xaxis.set_major_formatter(ticker.FixedFormatter(df.index.strftime('%Y-%m-%d')))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for the answer, it is a big help. Everything worked well for the sample data: https://www.dropbox.com/s/bcm47bot2n7civo/timeseries_test.png?dl=0 However for the real csv-data: https://www.dropbox.com/s/b1y8n862350vvuv/timeseries_from_csv.png?dl=0 y-values are close to zero, and finally blows up near the right end of x-axis.. Only thing changed in code is StringIO(temp) into 'timestamp01.csv' .. Any idea what might be causing this strange plot? – Gustav Rasmussen May 02 '17 at 11:11
  • Maybe need parameter `rot` `ax=df.set_index('TIMESTAMP').resample('D').size().plot(rot=90)` – jezrael May 02 '17 at 11:13
  • I added the rot=90, but plot remains the same.. Any ideas? – Gustav Rasmussen May 02 '17 at 11:21
  • values of axis `x` are not changed? Maybe last need add `plt.show()` and to `imports` - `import matplotlib.pyplot as plt` – jezrael May 02 '17 at 11:22
  • Doesn't change plot.. Could it have something to do with the changing of index to timestamps? I'm now trying: df = pd.read_csv('timestamp01.csv', parse_dates=True, index_col='TIMESTAMP') import matplotlib.ticker as ticker ax=df.set_index('TIMESTAMP').resample('D').size().plot(rot=9‌​0) ax.xaxis.set_major_formatter(ticker.FixedFormatter(df.index.strftime('%Y-%m-%d'))) ax.set_ylim([-1, 1]) plt.show() – Gustav Rasmussen May 02 '17 at 11:31
  • Hmmm, I think i dont understand it. Problem is with last value of `axis x` ? Do you need change it? – jezrael May 02 '17 at 11:33
  • I'm also a bit lost.. Maybe I should try to just apply the plotting to the first couple of rows from the df? – Gustav Rasmussen May 02 '17 at 11:34
  • Yes, try `df.head(100).resample('D').size().plot()` - plotting first 100 rows. – jezrael May 02 '17 at 11:35
  • And if need change last and first value of `axis x` `ax.set_xlim([pd.to_datetime('2017-03-10'), pd.to_datetime('2017-03-25')])`, but in my opinon better is dont do it and let matplotlib manage it (because if data are changed then ispossible some rows are no visualized) – jezrael May 02 '17 at 11:41
  • First using the head(100), nothing changed.. But adding your idea of set_xlim, the pattern is more clear: https://www.dropbox.com/s/qvdyk2wmmxhmxsd/timeseries_xrange.png?dl=0 – Gustav Rasmussen May 02 '17 at 11:47
  • Update: I set the x-lim with erroneous dates at the previous step, after correcting the x-range into ax.set_xlim([pd.to_datetime('2017-03-22'), pd.to_datetime('2017-04-29')]), the following figure resulted: https://www.dropbox.com/s/k2ihej92dqovmp0/timeseries_xrange.png?dl=0 (my last issue is to fix the x-labels, which looks strange; the same date is listed 3 times.. Any ideas how to solve this?) – Gustav Rasmussen May 02 '17 at 12:58
  • @GustavRasmussen - I think the best is created new question, because `mathplotlib` special things I dont know :( But I think some mathplotlib guru help you. – jezrael May 02 '17 at 13:29
  • Ok I will.. Thank you for taking the time, you have been a great help. – Gustav Rasmussen May 02 '17 at 13:57
  • Glad can help you, nice day! – jezrael May 02 '17 at 13:57
  • Thanks, have a great day :) – Gustav Rasmussen May 02 '17 at 14:05
2

Another way to plot is to use groupby and count occurrences:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('timestamp01.csv', parse_dates=[0], index_col=[0]) # set timestamp as index
ts = df.groupby(df.index.date).count() # count occurrences
ax = ts.plot() # plot
plt.setp(ax.xaxis.get_majorticklabels(), rotation=10) # format x axis
plt.show()

enter image description here

Serenity
  • 35,289
  • 20
  • 120
  • 115