1

It is hard to explain without showing what is going on. It basically appears that when I try to extract the indexes from a dataframe that the last value doesn't come with it.

I am using a pandas dataframe for starters.

My first data frame is

daily_stock_values                
             SPY    AAPL
2011-01-05  123.83  332.57
2011-01-06  123.59  332.30
2011-01-07  123.35  334.68
2011-01-10  123.19  340.99
2011-01-11  123.63  340.18
2011-01-12  124.74  342.95
2011-01-13  124.54  344.20
2011-01-14  125.44  346.99
2011-01-18  125.65  339.19
2011-01-19  124.42  337.39
2011-01-20  124.26  331.26

I get that when I run print daily_stock_values

so my next step is to then get only the SPY values. For this instance it doesn't make a difference but my code is this

daily_spy=daily_stock_values['SPY']
print daily_spy

The result is

    daily_spy  
2011-01-05    123.83
2011-01-06    123.59
2011-01-07    123.35
2011-01-10    123.19
2011-01-11    123.63
2011-01-12    124.74
2011-01-13    124.54
2011-01-14    125.44
2011-01-18    125.65
2011-01-19    124.42
2011-01-20    124.26

My next step is to then extract just the dates from daily_spy but for whatever reason, I cannot get the last date. Whenever I extract the index values, which are the dates, it pulls everything but the last one. I have tried two methods to get the dates out.

d = [i for i in daily_spy.index.values]
print "d ",d

[numpy.datetime64('2011-01-04T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-05T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-06T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-09T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-10T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-11T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-12T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-13T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-17T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-18T19:00:00.000000000-0500'), 
numpy.datetime64('2011-01-19T19:00:00.000000000-0500')]

I am not concerned with the formatting here as much as the fact that 2011-01-20 is not in this list.

I also just did simple for loop and it also doesn't show it either. Any ideas why?

Chris Jones
  • 662
  • 2
  • 10
  • 23
  • 2
    your dateimes look like they have some timezone information, also you'll notice that your first index value is also offset so I don't think you are missing any values, just that the datetimes have a timezone – EdChum Feb 18 '16 at 22:09
  • I honestly had not noticed that. But why would it randomly go back a day? – Chris Jones Feb 18 '16 at 22:19
  • It's not random, your timezone indicates that they are in -5 UTC (presumably New York) so 19:00 -5 hours in UTC would make it the next day – EdChum Feb 18 '16 at 22:21
  • Ok, but if you see the daily_stock_values, which is the parent so to speak it only has a date..so why it does it change. – Chris Jones Feb 18 '16 at 22:23
  • Probably because it's pretty printing for your convenience because displaying the full value would be a little too much to take in so it's converting to UTC so that it makes sense, this also explains why it's not displaying the time component because 19:00 -5 hours becomes 00:00 and by default times won't display if they're 00:00 – EdChum Feb 18 '16 at 22:26
  • Is there anyway I can make it basically ignore timezones then? I literally just want it as a date, which is how its coming in as a CSV. – Chris Jones Feb 18 '16 at 22:35
  • You should be able to do `df.index = df.index.tz_localize(tz=None)` to remove it – EdChum Feb 18 '16 at 22:36
  • So I tired putting that at the beginning on the first daily_stock_values and it didn't seem to pass all the way through. I honestly have no idea how this even happens considering the original file is reflected correctly by daily_stock_values – Chris Jones Feb 18 '16 at 22:51
  • @Chris You want to reformulate your question? If you think you are having issues with the way your file is read post the relevant information. – Stop harming Monica Feb 18 '16 at 22:54
  • I could at this point. – Chris Jones Feb 18 '16 at 22:54

1 Answers1

0

I ended up finding how to solve my issue. through Convert numpy.datetime64 to string object in python

Basically I turned

d = [i for i in daily_spy.index.values]

into

d = [pd.to_datetime(str(i)) for i in daily_spy.index.get_values()]

And then stripped the information I didn't need from date string. Thanks for getting me down the right path!!

Community
  • 1
  • 1
Chris Jones
  • 662
  • 2
  • 10
  • 23