1

I am trying to do some data validation in Excel but have a difficulty in making vlookup work recursively. Here is a small sample of data

+---------------+-----------------+
| Duration(sec) | Start time(sec) |
+---------------+-----------------+
| 1.428571429   | 96.57142857     |
| 1.888888889   | 95              |
| 1.888888889   | 96.22222222     |
| 2.428571429   | 95.71428571     |
| 2.75          | 96              |
| 2.8           | 95.3            |
| 2.846153846   | 94.30769231     |
| 2.857142857   | 97.42857143     |
| 3             | 94.8            |
| 3             | 97              |
| 3             | 99              |
| 3.111111111   | 95.66666667     |
| 3.2           | 95.5            |
| 3.333333333   | 96.22222222     |
| 3.416666667   | 80.33333333     |
| 3.416666667   | 94.16666667     |
| 3.5           | 94.1            |
| 3.615384615   | 78.92307692     |
+---------------+-----------------+

First column is the duration of an event and the second the starting point. I need to find the average starting point of the e.g. 5 items with smallest duration.

I started like this: {=AVERAGE(SMALL(IF(ISNUMBER(B:B),B:B,""),ROW(INDIRECT("1:5"))))} which gives me the average of 5 smallest durations. So far so good. It works as it should.

Then tried to mix the above with vlookup so that in the first step it will return the 5 smallest durations and then will lookup for the corresponding starting points and then calculate the average of them. I tried: {=AVERAGE(VLOOKUP(SMALL(IF(ISNUMBER(B:B),B:B,""),ROW(INDIRECT("1:5"))),B:C,2,FALSE))} but with no luck. VLOOKUP seems to return only one value. So I can not use it like this. Any ideas?

P.S. I use ISNUMBER because I have NaN values in the data.

maus
  • 75
  • 1
  • 6
  • Aahhh, any tips to format table look? – maus Jun 16 '14 at 11:09
  • From the posted sample, it appears that your data is already sorted by **duration.** If this is true, then the 5 smallest durations are already on the top. If this is true, then just average the top 5 items in column **B** – Gary's Student Jun 16 '14 at 12:02
  • Yes that's true but it doesn't help me a lot as I have multiple columns like this. Think of the one posted as referring to Year 2014. I have a huge Excel with these kind of columns side by side like: Dur2010,Start2010,Dur2011,Start2011 etc. If I sort for example with Dur2010 then OK I can take the first 5 elements of Start2010. But what about the rest?Everything else will be screwed up.. So I need to find a more sofisticated way to do this.. – maus Jun 16 '14 at 12:18
  • I have had no luck with `VLOOKUP` either (it seems to not like accepting arrays within `AVERAGE`) I was successful in using the pure `LOOKUP` function, however it will use the larger Start Time on duplicates (i.e. _both_ `1.888888889`'s will result in `96.22222222`) so the average is slightly off. If you think that will be acceptable I can post that as an answer but I didn't feel it was up to par so I am only suggesting it in a comment. – chancea Jun 16 '14 at 15:41

1 Answers1

1

Try this:

{=SUMPRODUCT(IF(DurNoErrors<=SMALL(DurNoErrors,5),TRUE,FALSE)*Start_Time)/5}

IMPORTANT: Note that this is an array formula so it must be entered with Ctl+Shift+Enter. The formula is also valid as a non-array forumula, however it will not give you the result you expect if entered normally! So just make sure you enter it as an array formula.

DurNoErrors is a named range set to:

=IFERROR(Duration," ")

Duration is a named range set to your duration data.

Start_Time is another named range set to your start time data.

This array formula works by:

  1. Finding the 5th smallest Duration value
  2. Using IF() to make an array of TRUE/FALSE values created from the Duration array based on whether or not each value is greater than (FALSE) or less than/equal to (TRUE) the 5th smallest value (the resulting array will contain 5 TRUEs, the rest of them will be FALSE)
  3. Multiplying that TRUE/FALSE array times the Start_Time array, so all Start_Times with Duration greater than the 5th smallest are now zeroed out
  4. Adding that array using SUMPRODUCT(), and dividing the result by 5

Using IFERROR() gets rid of any errors in the arrays.

Note: for your named ranges, you should not use, for example, A:A. Instead use something like A1:A100 because in general, using A:A in formulas will significantly slow down your worksheet. If you prefer, you can of course enter the address of your data in the formula above directly instead of using a named range (example: A1:A18 instead of Duration). However, named ranges are a good way to make your formulas more understandable when you come back a year later and are trying to figure out what the heck it is you were doing.

Community
  • 1
  • 1
Rick
  • 43,029
  • 15
  • 76
  • 119
  • It does the trick unless there is a missing or a NaN value. I need to modify this a little to avoid taking into account missing values. – maus Jun 17 '14 at 06:53
  • It seems that AND doesn't like arrays so I used this form: `{=SUMPRODUCT(IF((Duration<=SMALL(Duration,10))*(ISNUMBER(Duration)),1*Start_Time,0))/10}` – maus Jun 17 '14 at 12:06
  • That's what I was going to suggest - haven't had time to fiddle with it this morning. And now it's noon! Feel free to edit my answer if you have that privilege. I'll do it later otherwise. – Rick Jun 17 '14 at 16:15