1

There is a data-frame of sales, with the index being of type: date-time.

rng = pd.date_range('2015-01-03', periods=500)
df = pd.DataFrame({'historical_sales': np.random.choice([100,200,300], 
size=500)}, index=rng)
print (df)

We also have a list of special dates, some_dates:

some_dates = ['3/15/2017, '6/14/2017'.....]

Trying to subset data-frame by some_dates:

print(df.loc[some_dates])

I get a Key Error that "None of [[dates]] are in the [index]. Is this because I am sub-setting a list of strings instead of datetime?

As a workaround, to subset the data-frame, this worked:

container = []
for i in some_dates:
    container.append(df.loc[i])

dfNew = pd.DataFrame(container)

But i would like to further understand the reason of the error and if the workaround is not a 'bad convention'.

Fed
  • 121
  • 1
  • 8

1 Answers1

1

I think need convert to datetimes, because select by list of datetimes:

some_dates = pd.to_datetime(['3/15/2016', '3/14/2016'])

More general is get intersection between datetimeindex and some_dates:

some_dates = pd.to_datetime(['3/15/2016', '6/14/2016'])

idx = df.index.intersection(some_dates)
print(df.loc[idx])

But i would like to further understand the reason of the error and if the workaround is not a 'bad convention'.

In my opinion there is main reason in pandas is best avoid all loops if exist another, very often vectorized solutions. Also loops are obviously slowier.

You can also check this.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252