0

Suppose my data looks like:

data = {'Date':['2019-07-06', '2019-08-04', '2019-07-05', '2019-08-06'], 'Attending Cost': [1, 1, 1, 1]}
data_2 = pd.DataFrame.from_dict(data)

I want to select from it all the data that is between 2019-08-04 and 2019-08-06 inclusive. More generally, I have data arranged by month, and I want to select all data from one particular month. However, I have some outliers in my data which is not from that month but it is sitting in between them. I want to include these outliers in my selections as well. Note also within one month the date is not ordered. How should I achieve this?

jottbe
  • 4,228
  • 1
  • 15
  • 31
archer
  • 49
  • 5

3 Answers3

1

Try this one:

data_2[min(data_2.index[data_2["Date"]>="2019-08-04"]):max(data_2.index[data_2["Date"]<="2019-08-06"])+1]
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
1

This should give you the sum including the outliers:

dates = data_2.Date
data_2['Attending Cost'].iloc[dates[dates == '2019-08-04'].index[0]:dates[dates == '2019-08-06'].index[0]].sum()
Egal
  • 1,374
  • 12
  • 22
  • Thank you. Do you maybe know what if I do not know how my date is ordered meaning I do not know what date it starts from and what date it ennds at. – archer Sep 01 '19 at 19:32
  • I'm not sure what do you mean by that. If the data is not ordered, what is the meaning of outliers? – Egal Sep 01 '19 at 19:35
0

The easiest way is:

indexer= (data_2['Date'] >= '2019-08-04') & (data_2['Date'] <= '2019-08-06')
data_2[indexer]

This returns:

Out[504]: 
         Date  Attending Cost
1  2019-08-04               1
3  2019-08-06               1

Edit:

I think I got it. The logic:

indexer= (data_2['Date'] >= start_date_string) & (data_2['Date'] <= end_date_string)
data_2[indexer]

doesn't require, that the two strings are indeed included in your data, so if you want to query all records from say august, you can just do:

indexer= (data_2['Date'] >= '2019-08-01') & (data_2['Date'] <= '2019-08-31')
data_2[indexer]

This will work, even if your first record for august has date 2019-08-09 and your last has 2019-08-27, because it doesn't use index access.

jottbe
  • 4,228
  • 1
  • 15
  • 31