1

I am new to Python and Pandas. I wrote a little code to download 1 minute data from Google Finance. After using the following command:

new = pd.read_csv(string, skiprows=7, names = ("d", "o", "h", "l", "c", "v") )

I obtain a DataFrame such as the following:

          d        o        h        l        c       v
0 a1453905960  95.4500  95.4500  95.0900  95.0980  433810
1 a1453906020  95.0500  95.4700  94.9500  95.4500  934980
2 a1453906080  94.9400  95.1000  94.8700  95.0900  791657
3 a1453906140  94.8990  95.0300  94.7000  94.9620  763531
4 a1453906200  94.9300  95.0300  94.8200  94.8918  501298

where the first column is unix timestamp.

Next, I convert the unix timestamp into regular datetime with the following line

new['d']=new['d'].apply(lambda x:datetime.fromtimestamp(int(x[1:])).strftime('%Y-%m-%d %H:%M:%S'))

Now my d column contains strings with dates. If I use the following lines

new.index = new["d"]
del new["d"]

I just replace the old index with a new index made of strings containing datetimes. If I plot the c column with the following command

new["c"].plot()

I obtain a nice plot. "nice"

If instead I convert the index of my dataframe to datetime object with the following command

 new.index = pd.to_datetime(new.index)

and then I try

new["c"].plot()

I obtain the following plot bad plot

Why? What am I misunderstanding?

Thank you in advance.

Charlie
  • 286
  • 1
  • 2
  • 9
  • 1
    In the first chart, if you had a one month gap in your data, you wouldn't even notice it because the data is graphed sequentially (time is discontinuous on the X-axis). In the second chart, the time gaps are clear (time is continuous on the X-axis). You may want to specify `drawstyle = 'steps'` in your plot function. – Alexander Feb 17 '16 at 19:42
  • Unfortunately adding that simple option does not work – Charlie Feb 18 '16 at 10:50
  • @Alexander Yes, the problem is the one you pointed out. I have a datetime index that makes time continuous on the x axis when I plot. So how can I plot a time-series with some gaps? It is common to have gaps in financial time-series because on saturday and sunday there's no trading activity. – Charlie Feb 18 '16 at 11:19

1 Answers1

1

First index is from string column d, because strftime, second is datetimeindex

Maybe datetime is incorrect, but datetime.fromtimestamp doesnt work for me.

new['d']= new['d'].apply(lambda x: datetime.date.fromtimestamp(int(x[1:]))
                                                            .strftime('%Y-%m-%d %H:%M:%S'))
print new
                     d       o      h      l        c       v
0  2016-01-27 00:00:00  95.450  95.45  95.09  95.0980  433810
1  2016-01-27 00:00:00  95.050  95.47  94.95  95.4500  934980
2  2016-01-27 00:00:00  94.940  95.10  94.87  95.0900  791657
3  2016-01-27 00:00:00  94.899  95.03  94.70  94.9620  763531
4  2016-01-27 00:00:00  94.930  95.03  94.82  94.8918  501298

print new.dtypes
d     object
o    float64
h    float64
l    float64
c    float64
v      int64
dtype: object

print type(new.loc[0, 'd'])
<type 'str'>

new.index = new["d"]
del new["d"]

print new.index
Index([u'2016-01-27 00:00:00', u'2016-01-27 00:00:00', u'2016-01-27 00:00:00',
       u'2016-01-27 00:00:00', u'2016-01-27 00:00:00'],
      dtype='object', name=u'd')

new.index = pd.to_datetime(new.index)
print new.index
DatetimeIndex(['2016-01-27', '2016-01-27', '2016-01-27', '2016-01-27',
               '2016-01-27'],
              dtype='datetime64[ns]', name=u'd', freq=None)

Maybe you can use for create column d use to_datetime:

new['d'] = pd.to_datetime(new['d'].str[1:].astype(int), unit='s')

Or if you need string use strftime:

new['d'] = pd.to_datetime(new['d'].str[1:].astype(int), unit='s').dt.strftime('%Y-%m-%d %H:%M:%S')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I cannot completely understand your answer. But I think the problem is the one I explained in the comments below my question. – Charlie Feb 18 '16 at 11:20
  • I only point to differences between `indexes`. One is `string` and second is `datetimeindex`. So it is reason, why graphs are differently. – jezrael Feb 18 '16 at 11:25
  • 1
    And I think your first graph is correct way for ploting datetime - convert `datetime` to `string` [see](http://stackoverflow.com/questions/35085830/python-pandas-plot-time-series-with-gap) - or maybe you can [check](https://groups.google.com/forum/#!msg/pydata/x46E9Gpac68/oO2w2TiYR4w) – jezrael Feb 18 '16 at 11:31