1

I have a Google Sheet that is populated automatically via Zapier integration. For each new row added, I need to evaluate a given cell (Shipper Name) to find last instance of Shipper Name in prior rows, and if so, return Row# for the last entry.

Example Data Sheet

I am trying to create a formula that simply looks at name in new row and returns the number of the most recent row with that name.

Formula needs to run as an Array formula so that the data auto populates with each new row added to the Sheet.

I have tried to use this formula, but when refactored as Array formula, it doesn't populate new values for new rows, it just repeats the first value for all rows.

From Row J:

=sumproduct(max(row(A$1:A3)*(F4=F$1:F3)))

I need this formula refactored to be an Array formula that auto populates all the cells below it.

I have tried this version, but it doesn't work:

=ArrayFormula(IF(ISBLANK($A2:$A),"",sumproduct(max(row(A$1:A3)*($F4:$F=F$1:F3))))
  • Would you be willing to consider an Apps Script approach to your issue? @PaulWeinstein – ale13 Oct 20 '20 at 08:43
  • @ale13 - YES! I have also gone down that path, but because I don't know Javascript, I didn't get very far. I do think for a developer, this is a simple task, using a loop script to simply find the row of the next cell above. I'd be grateful for a custom function that does this as well. – Paul Weinstein Oct 20 '20 at 13:20
  • Questions should be self contained. While external links are welcome, consider adding a ascii table like ([this](https://stackoverflow.com/q/63837444) or [this](https://stackoverflow.com/q/64182673/)) or screenshots or csv text(like [this](https://stackoverflow.com/a/64186520/)) to show your data structure. – TheMaster Oct 20 '20 at 16:19
  • @PaulWeinstein It seems like you wanted those row numbers so you could use them in your other formulas. Would you be interested in a much simpler solution for the other columns's info that doesn't need to use the "last" row number that you're asking about? – MattKing Oct 21 '20 at 17:41

2 Answers2

4

A script (custom function maybe?) would be better.

Solution 1

Below is a formula you can place into the header (put in in J1, remove everything below).

It works much faster than the second solution and has no N² size restriction. Also it works with empty shippers (& "♥" is for those empty ones): as long as A:A column has some value it will not be ignored.

={
  "Row of Last Entry";
  ARRAYFORMULA(
    IF(
      A2:A = "",
        "",
        VLOOKUP(
            ROW(F2:F)
          + VLOOKUP(
              F2:F & "♥", 
              {
                UNIQUE(F2:F & "♥"),
                  SEQUENCE(ROWS(UNIQUE(F2:F)))
                * POWER(10, INT(LOG10(ROWS(F:F))) + 1)
              },
              2,
              0
            ),
          SORT(
            {
                ROW(F2:F) + 1
              + VLOOKUP(
                  F2:F & "♥", 
                  {
                    UNIQUE(F2:F & "♥"),
                      SEQUENCE(ROWS(UNIQUE(F2:F)))
                    * POWER(10, INT(LOG10(ROWS(F:F))) + 1)
                  },
                  2,
                  0
                ),
              ROW(F2:F);
              {
                  SEQUENCE(ROWS(UNIQUE(F2:F)))
                * POWER(10, INT(LOG10(ROWS(F:F))) + 1),
                SEQUENCE(ROWS(UNIQUE(F2:F)), 1, 0, 0)
              }
            },
            1,
            1
          ),
          2,
          1
        )
    )
  )
}

enter image description here

Details on how it works

  • For every row we use VLOOKUP to search for a special number in a sorted virtual range to get the row number of the previous entry matching current.
  • A special number for a row is constructed like this: we get a sequential number for the current entry among unique entries and append to it current row number.
  • The right part (row number) of the resulting special numbers must be aligned between them. If the entry has sequential number 13 and the row number is 1234 and there are 100500 rows, then the number must be 13001234. 001234 is the aligned right part.
  • Alignment is done by multiplying a sequential number by 10 to the power of (log10(total number of rows) + 1), gives us 13000000 (from the example above). This approach is used to avoid using LEN and TEXT - working with numbers is faster then working with strings.
  • Virtual range has almost the same special numbers in the first column and original row numbers in the second.
  • Almost the same special numbers: they just increased by 1, so VLOOKUP will stop at most one step before the number corresponding to the current string.
  • Also virtual range has some special rows (added at the bottom before sorting) which have all 0's as the right part of their special numbers (1st column) and 0 for the row number (2nd column). That is done so VLOOKUP will find it for the first occurrence of the entry.
  • Virtual range is sorted, so we could use is_sorted parameter of the outer VLOOKUP set to 1: that will result in the last match that is less or equal to the number being looked for.
  • & "♥" are appended to the entries, so that empty entries also will be found by VLOOKUP.

Solution 2 - slow and has restrictions

But for some small enough number of rows this formula works (put in in J1, remove everything below):

={
  "Row of Last Entry";
  ARRAYFORMULA(
    REGEXEXTRACT(
      TRANSPOSE(QUERY(TRANSPOSE(
        IF(
            (FILTER(ROW(F2:F), F2:F <> "") > TRANSPOSE(FILTER(ROW(F2:F), F2:F <> "")))
          * (FILTER(F2:F, F2:F <> "") = TRANSPOSE(FILTER(F2:F, F2:F <> ""))),
            TRANSPOSE(FILTER(ROW(F2:F), F2:F <> "")),
            ""
         )
      ), "", ROWS(FILTER(F2:F, F2:F <> "")))),
      "(\d*)\s*$"
    )
  )
}

But there is a problem. The virtual range inside of the formula is of size N², where N is the number of rows. For current 1253 rows it works. But there is a limit after which it will throw an error of a range being too large.

That is the reason to use FILTER(...) and not just F2:F.

enter image description here

kishkin
  • 5,152
  • 1
  • 26
  • 40
  • this works perfectly on the dataset so far. Just one point of clarification, your solution using FILTER(...) and not just F2:F will prevent the error from a range that is too large? – Paul Weinstein Oct 20 '20 at 14:01
  • It helps now, as there are about 17000+ rows with data (works fine) and 30000+ total (does not work). If the number of rows with data will get larger it will eventually break. – kishkin Oct 20 '20 at 14:30
  • Can we refine the formula so that it looks back a maximum number of rows? 10000 rows look back would be plenty for this application. – Paul Weinstein Oct 20 '20 at 16:45
  • @PaulWeinstein not sure about refining, but I came up with another solution. It does not have those restrictions like the previous one has. – kishkin Oct 21 '20 at 10:21
  • Brief explanation of the code(hows and whys) will improve the answer quality. – TheMaster Oct 21 '20 at 10:31
  • @TheMaster won't be brief and easily comprehensible at the same time, but I'll try – kishkin Oct 21 '20 at 10:50
  • 2
    @TheMaster added the details. – kishkin Oct 21 '20 at 12:24
  • @kishkin Curious if you think my new solution is targeting the correct data. – MattKing Oct 21 '20 at 18:37
  • @MattKing hey Matt! *However, it only does this if the time is less than 200 minutes. IF it is more than 200 minutes* - do you work in the next building to the OP? :) Tying to comprehend now. – kishkin Oct 22 '20 at 12:48
  • @MattKing got some basic stuff from your solution, that to my shame I somehow was not aware of. Thanks for that! `'Sample Data'!H:H` uses the same mechanics as my solution does (was it something known to the public? I was so glad to invent it myself :), never seen it before). You do not convert names to numbers, but I was not sure I could use it like that as it quite ambiguous how lookup is done with `is_sorted` set to true. Anyway, what you did make a lot of sense, a nice example of why one would need to find previous equal entry among others. – kishkin Oct 22 '20 at 13:34
  • @MattKing And yes, really looks like XY problem, but the question is still self-sustained. – kishkin Oct 22 '20 at 13:36
  • @kishkin yes, that technique is well known to the public. But it's great you figured it out on your own! – MattKing Oct 22 '20 at 13:55
2

Here is a significantly simpler way to get at the information you're interested in. (I think.) I'm mostly guessing about what you want because your question wasn't really about what you want, but rather about how to get something that you think would help you get what you want. This is an example of an XY problem. I attempted to guess based on experience at what you're really after.

This editable sheet contains just 3 formulas. 2 on the raw data sheet and one in a new tab called "analysis."

The first formula on the Raw data tab extracts a properly formatted timestamp using a combination of MMULT and SPLIT functions and looks like this:

=ARRAYFORMulA({"Good Timestamp";IF(A2:A="",,MMULT(N(IFERROR(SPLIT(A2:A,"T"))),{1;1}))})

The second formula finds the amount of time since the previous timestamp for that Shipper. and subtracts it from the current timestamp thereby giving you the time between timestamps. However, it only does this if the time is less than 200 minutes. IF it is more than 200 minutes, it assumes that was a different shift for that shipper. It looks like this and uses a combination of LOOKUP() and SUBSTITUTE() to make sure it's pulling the correct timestamps. Obviously, you can find and change the 200 value to something more appropriate if it makes sense.

=ARRAYFORMULA({"Minutes/Order";IF(A2:A="",,IF(IFERROR((G2:G-1*SUBSTITUTE(LOOKUP(F2:F&G2:G-0.00001,SORT(F2:F&G2:G)),F2:F,""))*24*60)>200,,IFERROR((G2:G-1*SUBSTITUTE(LOOKUP(F2:F&G2:G-0.00001,SORT(F2:F&G2:G)),F2:F,""))*(24*60))))})

The third formula, on the tab called analysis uses query to show the average minutes per order and the number of orders per hour that each shipper is processing. It looks like this:

=QUERY({'Sample Data'!F:I},"Select Col1,AVG(Col3),COUNT(Col3)/(SUM(Col3)/60) where Col3 is not null group by Col1 label COUNT(Col3)/(SUM(Col3)/60)'Orders/ hour',AVG(Col3)'Minutes/ Order'")

Hopefully I've guessed correctly at your real goals. Always do your best to explain what they are rather than asking for only a small portion that you think will help you get to the answer. You can end up overcomplicating your process without realizing it.

MattKing
  • 7,373
  • 8
  • 13
  • You definitely guessed correctly. Thank you for the solution. This works exactly as required. And thank you for the coaching on how to ask a better question. I will keep that in mind for future questions. – Paul Weinstein Oct 28 '20 at 14:10