2

I'm doing a little bit of math on some indices that I have saved in a CSV file, and I'm getting some behavior from .loc that I can only describe as... strange. When I read this CSV file into a dataframe using Pandas, I see the following:

[1]: import pandas as pd

[2]: df = pd.read_csv(csv_path, parse_dates=True, index_col="Date")

[3]: df = df.apply(pd.to_numeric, errors='coerce') # shouldn't matter

[4]: df.head(5)

Date          idx1        idx2         idx3         idx4       idx5
2019-03-22    106.1069    106.6425     106.520      106.45     105.870 ...
2019-03-21    106.6994    107.1746     106.975      106.87     106.145 ...
2019-03-20    106.4900    107.0894     106.875      106.84     106.095 ...
2019-03-19    106.4661    106.9107     106.820      106.71     106.100 ...
2019-03-18    106.5319    107.0137     106.760      106.75     106.100 ...
[5 rows x 53 columns]

When I print the index and index.values I also see the following:

[5]: print df.index

DatetimeIndex(['2019-03-22', '2019-03-21', '2019-03-20', '2019-03-19',
           '2019-03-18', '2019-03-15', '2019-03-14', '2019-03-13',
           '2019-03-12', '2019-03-11',
           ...
           '2013-02-07', '2013-02-06', '2013-02-05', '2013-02-04',
           '2013-02-01', '2013-01-31', '2013-01-30', '2013-01-29',
           '2013-01-28', '2013-01-25'],
          dtype='datetime64[ns]', name=u'Date', length=1539, freq=None)

[6]: print df.index.values

['2019-03-22T00:00:00.000000000' '2019-03-21T00:00:00.000000000'
 '2019-03-20T00:00:00.000000000' ... '2013-01-29T00:00:00.000000000'
 '2013-01-28T00:00:00.000000000' '2013-01-25T00:00:00.000000000']

Now here's where it gets weird. If I run the following:

[7]: df.loc["2019-03-21"]

Date          idx1        idx2         idx3         idx4       idx5
2019-03-21    106.6994    107.1746     106.975      106.87     106.145
[1 rows x 53 columns]

I get what I expect which is the row corresponding to that date. However, when I run the same exact thing with:

[8]: print df.loc["2019-03-22"]
KeyError: 'the label [2019-03-22] is not in the [index]' 

I get a KeyError saying this label is not in the index. I have gone to the actual CSV file to confirm that date is there and I've tried various other .loc dates and have had success with all of them except for 2019-03-22.

Can anyone give me a hint as to what might be going on here? I cannot for the life of me figure out what's going on.

In response to the question from Edeki Okoh below:

print df.index.get_loc("2019-03-22")
[0]

print df.index.get_loc("2019-03-21")
[1]

df.iloc[0]
Out[17]: 
idx1                   106.107
idx2                   106.642
idx3                   106.52
idx4                   106.45
idx5                   105.87

Name: 2019-03-22 00:00:00, dtype: object
weskpga
  • 2,017
  • 7
  • 29
  • 43
  • Can you use df.index.get_loc('2019-03-22') and use df.index.get_loc('2019-03-21') and tell me what values get returned? Also can you try dr.iloc[0] and tell me which row gets returned? I want to make sure that the first row is actually in the dataframe or if it gets read differently. – Edeki Okoh Mar 25 '19 at 16:54
  • Updated my answer. They still confirm that `2019-03-22` is in the dataframe's index, which is why this is so confusing. – weskpga Mar 25 '19 at 17:29
  • 1
    have you try df.loc[datetime(2019,03,22)] – Frenchy Mar 25 '19 at 17:49
  • @Frenchy yes, and that works (that's the workaround I have created for now). But that said, it still shouldn't be the case that using strings would work for every date except for this one, which is what I'm trying to get to the bottom of. – weskpga Mar 25 '19 at 18:02
  • Try passing [infer_datetime_format](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) into read_csv. That for some reason when reading the csv it did not recognize the first column as datetime. Thats why once you convert it to datetime you can slice it – Edeki Okoh Mar 25 '19 at 18:32
  • @EdekiOkoh tried and it still doesn't work for `2019-03-22` and works for every other date – weskpga Mar 25 '19 at 18:36
  • Does this work: df.loc[df['Date']=='2019-03-22']? Or whatever Date is, I can't tell if it has white space or not – Edeki Okoh Mar 25 '19 at 18:43
  • `df.loc[df.index == "2019-03-22"]` works while `df.loc["2019-03-22"]` still doesn't. This is actually so strange. – weskpga Mar 25 '19 at 18:56
  • I think I found something [similar](https://stackoverflow.com/questions/36871188/how-to-access-pandas-dataframe-datetime-index-using-strings) specifically bob_monsen answers – Edeki Okoh Mar 25 '19 at 19:49

0 Answers0