I have 10,000 rows of data in Excel and the first column is the time of a data recording (hh:mm:ss). I want to filter/extract (in the front end) only the rows where the readings are at least six consecutive seconds - these are signal strength recordings and I only consider readings of at least six consecutive seconds to be valid for my purposes. I would only want to extract the rows in green in the attached image. Separate ask - what if I wanted segments of only six to eight seconds? Thanks!
-
How do you determine which are 6 seconds? is the entire database highlighted in green? – Lakshan Costa Jan 18 '23 at 00:18
-
Hi, I am trying to automate identifying the sequences that are six or more consecutive seconds. So instead of me going through 10,000 rows of data and manually identifying them (the ones I shaded in green), I would like to automate it through a formula. – Gregg Rosenstein Jan 18 '23 at 00:44
-
1@GreggRosenstein If no one comes up with a formula, this can be done in Power Query, available in Windows Excel 2010+ and Excel 365. – Ron Rosenfeld Jan 18 '23 at 02:25
-
How do you want to see the result? A helper column to indicate the consecutive records or do you want to extract the resulting records to a separate range? – EEM Jan 18 '23 at 12:32
-
@GreggRosenstein please next time provide your input in Table Markdown format, it helps us to reproduce the scenario. You can use the following tool: [Markdown Table Generator](https://www.tablesgenerator.com/markdown_tables). Thanks! – David Leal Jan 19 '23 at 19:25
1 Answers
With O365 (assuming no excel version constraints per tags used in the question) you can try the following in cell D1
:
=LET(rng, A1:B32, ref, SCAN("", INDEX(rng,,1), LAMBDA(ac,a, IF(ac="", 1,
IF((a - TIME(0,0,1)) = OFFSET(a,-1,0), 0, 1)))), size, ROWS(ref),
seq, SEQUENCE(size), start, FILTER(seq, ref=1),
end, VSTACK(DROP(start-1, 1), size), DROP(REDUCE("", start, LAMBDA(ac,s,
LET(e, XLOOKUP(s, start, end), f, FILTER(rng, (seq >= s) * (seq <= e)),
IF(ROWS(f) > 5, VSTACK(ac, f), ac)))),1))
Note: The same result can be achieved to avoid using the volatile function OFFSET
, not using SCAN
, and instead calculating the ref
value using comparison as follow, where A
represents the first column of rng
(we keep the same logic as previous formula, but it can be simplified changing the logic to check 0
instead 1
to remove 1-N()
):
=LET(rng,A1:B32,A,INDEX(rng,,1), ref,VSTACK(1,1-N(DROP(A,1)
=(DROP(A,-1)+TIME(0,0,1)))), size, ROWS(ref), seq, SEQUENCE(size),
start, FILTER(seq, ref=1), end, VSTACK(DROP(start-1, 1), size),
DROP(REDUCE("", start, LAMBDA(ac,s, LET(e, XLOOKUP(s, start, end),
f, FILTER(rng, (seq >= s) * (seq <= e)),
IF(ROWS(f) > 5, VSTACK(ac, f), ac)))),1))
Then adjust the input range (rng
) to your real case.
Explanation
Using LET
for easy reading and composition. The name ref
identifies with 1
, the start of each group of values with consecutive seconds of the first column from rng
. It has the same number of rows as the input data.
We use DROP/REDUCE/VSTACK
pattern to generate iteratively the output with the data that satisfies the conditions the group has more than 5 consecutive elements. Check the following answer to this question: how to transform a table in Excel from vertical to horizontal but with different length
Via REDUCE
we iterate over all start group positions (start
). For each start group position (s
), finds the corresponding end group position (e
) via XLOOKUP
. Filter the range (rng
) for rows (seq
) between start (s
) and end (e
) rows via FILTER
. Append the filter result (f
) only if the number of rows is bigger than 5
via the VSTACK
function.
Conditional Format
The formula provided is too large to be used for a conditional format formula (255
maximum number of characters). A possible workaround could be to use a helper column (you can hide it). It returns TRUE
if the row belongs to a valid group, otherwise returns FALSE
. If no valid group were found it returns FALSE
too (we use this trick: NOT(SEQUENCE(size,,1,0)=1)
to generate a constant column with FALSE
values). Then you can highlight the column A
values that correspond with TRUE
values in the helper column.
=LET(A, TOCOL(A:A,1), ref, VSTACK(1,N(DROP(A,1) = (DROP(A,-1)+TIME(0,0,1)))),
size, ROWS(ref), seq, SEQUENCE(size), start, FILTER(seq, ref=0),
end, VSTACK(DROP(start-1, 1), size), gr, FILTER(HSTACK(start, end),
(end-start +1) > 5, -1), sgr, INDEX(gr,,1), egr, INDEX(gr,,2),
IF(@gr=-1, NOT(SEQUENCE(size,,1,0)=1), MAP(seq, LAMBDA(x,
LET(overlaps, SUM((sgr <= x) * (egr >= x)), IF(overlaps = 1, TRUE, FALSE))))))
as input, we use the entire column and filter by nonblank values via the TOCOL
function (named A
) using the second input argument of this function. In case of more than one overlapping interval, it returns FALSE
too, just for testing purposes, it indicates some error calculating start
and end
names because per design it should never happen.
Tip: The previous formula can be used for the original purpose, using its output as a condition for a FILTER
function to select only the input range where the value is TRUE
. It is a matter of taste which route you want to go. For example, as follow:
=LET(rng,A1:B32,A,INDEX(rng,,1), ref, VSTACK(1,N(DROP(A,1)
= (DROP(A,-1)+TIME(0,0,1)))), size, ROWS(ref), seq, SEQUENCE(size),
start, FILTER(seq, ref=0), end, VSTACK(DROP(start-1, 1), size),
gr, FILTER(HSTACK(start, end), (end-start +1) > 5, -1),
sgr, INDEX(gr,,1), egr, INDEX(gr,,2),
incl, IF(@gr=-1, NOT(SEQUENCE(size,,1,0)=1), MAP(seq, LAMBDA(x,
LET(overlaps, SUM((sgr <= x) * (egr >= x)), IF(overlaps = 1, TRUE, FALSE))))),
FILTER(rng, incl=TRUE, "No Group found"))

- 6,373
- 4
- 29
- 56
-
1Thank you @David Leal! This is exactly what I was looking for. Is there any way to bring over conditionally formatted cells with their shade? – Gregg Rosenstein Jan 18 '23 at 23:08
-
@GreggRosenstein I was trying to create a formula for a conditional formatting, the problem is that the formula you can enter there needs to have a maximum of 255 characters, and using the logic of my response adapting it for a conditional formatting it is longer than that – David Leal Jan 19 '23 at 03:29
-
got it. My original table has some cells in blue - they are conditionally formatted as the solution set when I have a signal. So your export is incredible, taking 10,000 rows of data and returning in my case the 300 rows I am looking at. But then I still have to manually examine the 300 rows and compare them to the original since I no longer have the blue formatting. – Gregg Rosenstein Jan 19 '23 at 18:01
-
The only solution I can think of is to use a helper function that identifies the valid groups returning `TRUE` otherwise `FALSE` and then to hide the column. Updated my answer with this approach. I don't see any other way of doing it without a helper column. I hope it helps. You can keep this column also for testing purpose, once you are sured that the formula works you can remove it. – David Leal Jan 19 '23 at 19:01
-
Thanks again @David Leal. One quick question and one comment. How would I code it if I wanted results between 6 and 8 seconds (a result of 6 to 8 rows)? And my comment - I took my table with conditional formatting, filtered it for the possible rows to use the formula on, and then put in VB code to remove the conditional formatting but keep the current color (and that works). But the incredible solution you provided still doesn't keep the cell color when exporting. Thoughts? The table markdown format won't help because it won't show coloring, so I'm editing my original post pic. Thanks! – Gregg Rosenstein Jan 20 '23 at 01:14
-
For adding a more specific condition like you mentioned replace the portion: `(end-start +1) > 5` by this `((end-start +1) > 5) * ((end-start +1) < 9)` It works on my sample file, difficult to figure out in your real situation. Share a link of your file editable, to see if I can take a look. – David Leal Jan 20 '23 at 01:34