0

I have a DataFrame with multiple columns, df. One of those columns is named Date and whose items are datetime objects.

Given a date, datequery, I want to return the row that is closest to that date.

Going off of this previous answer, I thought of using the following code:

result = min(df.iterrows(), key=lambda x: abs(x['Date'] - datequery))

The way I mentally walk through the code is thusly:

  1. I give the min function an iterable object that iterates over each row since I want the output of the function to be the row
  2. min then evaluates the minimum value of the keyfunction and outputs the corresponding item from the iterable. So it would output the row (item) corresponding to the lowest value given by the keyfunction

Instead of doing this, I'm greeted by the following error:

In [21]: min(df.iterrows(), key=lambda x: abs(x['Date'] - datequery))

--------------------------------------------------------------------------- 
TypeError                                 Traceback (most recent call last)

c:/Random/Path/script.py in <module>()
----> 1 min(df.iterrows(), key=lambda x: abs(x['Date'] - datequery))

c:/Random/Path/script.py in <lambda>(x)
----> 1 min(df.iterrows(), key=lambda x: abs(x['Date'] - datequery))

TypeError: tuple indices must be integers or slices, not str

I'm well aware of the fact that tuple indices can't be a string. So my question is three fold:

  1. What tuple is it talking about? Because there are no tuples being used (unless it's something inside datetime)
  2. Where is this string coming from. Again, the only string in the workspace is 'Date', but that should only be for accessing the 'Date' column, not as indices for a tuple.
  3. How do I do this?
James Wright
  • 1,293
  • 2
  • 16
  • 32

1 Answers1

0

So, to answer my three questions:

  1. What tuple is it talking about? Because there are no tuples being used (unless it's something inside datetime)

The tuple is the output of iterrows. I forgot that iterrows outputs (index, row) tuple pairs, not just the row itself.

  1. Where is this string coming from. Again, the only string in the workspace is 'Date', but that should only be for accessing the 'Date' column, not as indices for a tuple.

The string is the 'Date'.

  1. How do I do this?
min(df.iterrows(), key=lambda x: abs(x[1]['Date'] - datequery))

So the [1] is to access the row part of the (index, row) tuple pair that results from iterrows(). That was the only thing I was missing.

James Wright
  • 1,293
  • 2
  • 16
  • 32