-2

Excel 2016

In Excel, I'm trying to count all same jobs that have repeated last reasons.

However I only need the last reason and the count of that reason if repeated.

If the reason changes then count stops.

date job reason
31/10/2017 barr skip
30/10/2017 sheep run
29/10/2017 pig  run
28/10/2017 barr skip
27/10/2017 barr skip
26/10/2017 sheep run
25/10/2017 pig  skip
24/10/2017 barr run
23/10/2017 car  skip
22/10/2017 cow  run
21/10/2017 car  run
20/10/2017 cow  skip
20/10/2017 barr skip

so for example table barr has a skip on 31st, 28th and 27th. on the 24th has a job run as no need to count further. so the result would be.

Job Last Reason EMV
barr Skip 3
sheep run 2
pig run 1
car run 1
cow skip 1

Also please note that this data is around 800k in length. so need to be efficient with no access to any database of any form.

I've tried with countifs with no success.

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • Possible duplicate of https://stackoverflow.com/questions/16823468/excel-duplicate-values-pairs-in-multiple-column – TidyDev Oct 31 '17 at 09:23
  • Hi many thanks for your comment.. i'm not looking for duplicate pairs. as per your link. I only need to count last occurrences only. – Jason Sykes Oct 31 '17 at 09:41
  • So you need to count the pairs and subtract 1? – TidyDev Oct 31 '17 at 09:46
  • So for example, job "barr", could already have 2000+ occurrences. However I only need a count last occurrences of the same reason. so in example above "barr" shows 5 times, last reason is "skip" and is repeated 3 times. so need that result only. – Jason Sykes Oct 31 '17 at 09:56
  • Is there a fixed amount of names in column 2, or is it at random? – Luuklag Oct 31 '17 at 10:12
  • You're still counting duplicate values, but just narrow your search between the last and most recent occurrence. Stack Overflow isn't a code writing service so you should be breaking it down into smaller problems and asking for help when you get stuck. https://stackoverflow.com/help/mcve – TidyDev Oct 31 '17 at 10:13
  • @tidyDev, from this OP its pretty clear what he want's as desired results are included. So no need to break it down further IMHO. – Luuklag Oct 31 '17 at 10:15
  • HI @Luuklag jobs have around 20000 approx. Reasons 10 and are all set. – Jason Sykes Oct 31 '17 at 10:19
  • Well, your problem is most likely to find out a list of the 20k jobs excluding duplicates. – Luuklag Oct 31 '17 at 10:29

1 Answers1

0

First I will assume that you have some way of creating a list of 20K jobs. There are plenty of methods out there - maybe a pivot table with that amount of data if you don't need it to be completely dynamic.

In my opinion the best approach to getting the count of last reason is not to use a brute-force array formula method because it will be too slow. It's best to use an old-fashioned data processing approach - but then I would say that because I am an old-fashioned data processing person.

Step 1

Sort the data by job. Excel sort is very fast and it's stable so you get this

date        job   reason
31/10/2017  barr  skip
28/10/2017  barr  skip
27/10/2017  barr  skip
24/10/2017  barr  run
20/10/2017  barr  skip
23/10/2017  car   skip
21/10/2017  car   run
22/10/2017  cow   run
20/10/2017  cow   skip
29/10/2017  pig   run
25/10/2017  pig   skip
30/10/2017  sheep run
26/10/2017  sheep run

Note that the reasons are still in order within each job. Assume this is in a sheet called jobs.

Step 2

Then you can use some quick and simple formulas to get the results you want.

Start of job in B2

=MATCH(A2,jobs!B:B,0)

First reason in C2

=INDEX(jobs!C:C,B2)

Count of reasons per job in D2

=COUNTIF(jobs!B:B,A2)

Index of reason which is different from the first reason in E2

=IFERROR(MATCH(TRUE,(INDEX(jobs!C:C,B2):INDEX(jobs!C:C,B2+D2-1)<>C2),0),D2+1)

(this is an array formula but only has to search the space for the current job say about 40 reasons on average according to your figures)

Number of repeats of first reason in F2

=E2-1

enter image description here

Please note that I am calling the first reason in the list the first reason but it's the last reason chronologically because your list is in reverse chronological order.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37