-4

I need help in Excel of finding finding the Correct Output based on Amount of four conditional statements of:

  1. max value of Timestamp;
  2. if there are duplicate max dates, use the last row of the same Timestamp because the data does not extend upto seconds;
  3. per month of the Timestamp;
  4. per (4) unique ID

and leave the other rows blank. The Correct Output column is the expected output.

Here's the sample data:

╔════╦═════════════════════════════╦═════════╦════════════════╦══╦═════════════╦════════════╦═════════╗
║ ID ║ Timestamp (yyyy-mm-dd H:mm) ║ Amount  ║ Correct Output ║  ║ Date Format ║  Max Date  ║  Test   ║
╠════╬═════════════════════════════╬═════════╬════════════════╬══╬═════════════╬════════════╬═════════╣
║ A  ║ 2013-04-15 20:42            ║ NULL    ║ NULL           ║  ║ 2013-04-15  ║ 2013-04-15 ║ NULL    ║
║ A  ║ 2013-05-23 23:09            ║ 300000  ║ 300000         ║  ║ 2013-05-23  ║ 2013-05-23 ║ 300000  ║
║ A  ║ 2013-09-15 23:36            ║ 300000  ║ 300000         ║  ║ 2013-09-15  ║ 2013-09-15 ║ 300000  ║
║ A  ║ 2013-12-19 5:58             ║ 300000  ║                ║  ║ 2013-12-19  ║ 2013-12-19 ║ 0       ║
║ A  ║ 2013-12-19 6:18             ║ 0       ║                ║  ║ 2013-12-19  ║ 2013-12-19 ║ 0       ║
║ A  ║ 2013-12-19 6:18             ║ 0       ║ 0              ║  ║ 2013-12-19  ║ 2013-12-19 ║ 0       ║
║ B  ║ 2013-05-23 23:00            ║ NULL    ║                ║  ║ 2013-05-23  ║ 2013-05-23 ║ 0       ║
║ B  ║ 2013-05-23 23:04            ║ 0       ║ 0              ║  ║ 2013-05-23  ║ 2013-05-23 ║ 0       ║
║ B  ║ 2013-05-30 3:48             ║ 2339000 ║                ║  ║ 2013-05-30  ║ 2013-05-30 ║ 1169500 ║
║ B  ║ 2013-05-30 3:54             ║ 1169500 ║ 1169500        ║  ║ 2013-05-30  ║ 2013-05-30 ║ 1169500 ║
║ B  ║ 2013-06-02 11:46            ║ 2339000 ║                ║  ║ 2013-06-02  ║ 2013-06-02 ║ 2339000 ║
║ B  ║ 2013-06-27 8:40             ║ 2339000 ║                ║  ║ 2013-06-27  ║ 2013-06-27 ║ 2339000 ║
║ B  ║ 2013-06-27 13:59            ║ 2339000 ║                ║  ║ 2013-06-27  ║ 2013-06-27 ║ 2339000 ║
║ B  ║ 2013-06-27 13:59            ║ 2339000 ║ 2339000        ║  ║ 2013-06-27  ║ 2013-06-27 ║ 2339000 ║
║ C  ║ 2015-02-09 12:43            ║ NULL    ║                ║  ║ 2015-02-09  ║ 2015-02-09 ║ NULL    ║
║ C  ║ 2015-02-09 12:43            ║ NULL    ║ NULL           ║  ║ 2015-02-09  ║ 2015-02-09 ║ NULL    ║
║ D  ║ 2013-12-02 0:05             ║ NULL    ║                ║  ║ 2013-12-02  ║ 2013-12-02 ║ 37000   ║
║ D  ║ 2013-12-02 0:37             ║ 37000   ║ 37000          ║  ║ 2013-12-02  ║ 2013-12-02 ║ 37000   ║
║ D  ║ 2014-07-16 7:36             ║ 0       ║                ║  ║ 2014-07-16  ║ 2014-07-16 ║ 0       ║
║ D  ║ 2014-07-16 7:36             ║ 0       ║ 0              ║  ║ 2014-07-16  ║ 2014-07-16 ║ 0       ║
║ E  ║ 2013-06-27 8:35             ║ NULL    ║ NULL           ║  ║ 2013-06-27  ║ 2013-06-27 ║ NULL    ║
╚════╩═════════════════════════════╩═════════╩════════════════╩══╩═════════════╩════════════╩═════════╝

References:

  • Column A - ID
  • Column B - Timestamp
  • Column F - Date Format
  • Column G - Max Date
  • Column H - Test

Here are the steps I tried:

  1. I created the Date Format using date
  2. Created the Max Date with =MAX(IF($F$2:$F$22=F2,IF($A$2:$A$22=A2,$B$2:$B$22))) (in array)
  3. Created Test with =INDEX(C2:C22,MATCH(G2,B2:B22,0)) (in array)
FoxyReign
  • 57
  • 1
  • 8
  • 1
    What have you tried so far? Where did it fail, and how did it operate differently than expected? – Grade 'Eh' Bacon Jul 21 '15 at 17:35
  • 1
    Also provide your data in an easy-to-use format, see: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 and show us your expected result. – mts Jul 21 '15 at 17:42
  • Hello mts, The output is the Final Value Column – FoxyReign Jul 21 '15 at 17:52
  • Grade Eh Bacon, In excel, I created separate columns to make the Date format into YYYY-MM-DD. Then, in another column, this is my formula: =MAX(IF($F$2:$F$22=F2,IF($A$2:$A$22=A2,$C$2:$C$22))) Column F - new Date format of YYYY-MM-DD Column A - ID Column C - Amount So, Im stuck with duplicate timestamp. – FoxyReign Jul 21 '15 at 17:54
  • @FoxyReign that's not incredibly clear to me what you're doing. Please re-write your question with an edit to include all information, including what you've tried, and what the results of your attempts are (where do problems occur, and what are those problems)? – Grade 'Eh' Bacon Jul 21 '15 at 18:04
  • @Grade'Eh'Bacon I apologize if the question is very unclear. I just edited and included what I did which is very wrong. – FoxyReign Jul 21 '15 at 18:26
  • Again **what is happening that is not as expected**? What is the error that shows up? What value did you expect? – Grade 'Eh' Bacon Jul 21 '15 at 18:27
  • @Grade'Eh'Bacon there's no error but the output is wrong. The Test column is my wrong output. – FoxyReign Jul 21 '15 at 18:30
  • I'm failing to understand something about your problem. Why is it that you expect your "correct" output to be what it is? Give an explicit example of a row where your test output is different from your "correct" output, and explain why you would expect the "correct" output to be there. It is not clear what the purpose of the table is. Are you... trying to show the "Amount" for any rows where there is... a unique ID, and the date = the max date? – Grade 'Eh' Bacon Jul 21 '15 at 18:38
  • @Grade'Eh'Bacon this is based on manual effort of what's supposed to be the correct output; id consider this as ground truth to check if my formula is correct. Yes, you're correct - show the Amount based on unique ID and max timestamp AND if there's duplicate in max timestamp, it should use the last row based on preceding conditions. – FoxyReign Jul 21 '15 at 18:45
  • Please explain this to me as if I were a idiot, because I still can't understand what you mean here: "if there's duplicate in max timestamp, it should use the last row based on preceding conditions." – Grade 'Eh' Bacon Jul 21 '15 at 18:47
  • @Grade'Eh'Bacon for example, ID A with timestamp of 2013-12-19 6:18, there are two records. It just happened that the amount of these duplicates are the same. I assume that these timestamps happen in between seconds but the data I only received are upto minutes only. If this happens, where the amount are different, by using `MAX`, it should get the latest timestamp, but without the seconds, I am only left with the sorted data, which was pre-sorted in ascending order when I received it. – FoxyReign Jul 21 '15 at 19:11
  • On the other hand, I am asking if I can get the timestamp with the seconds that will eliminate the duplicate max timestamp. – FoxyReign Jul 21 '15 at 19:12
  • Why is the "Correct Output" for the line B||2013-06-02 11:46||2339000 blank? – XOR LX Jul 21 '15 at 21:40
  • @XORLX yes. from 2013-06-02 11:46 to 2013-06-27 13:59 should be blank and the 2nd 2013-06-27 13:59 should display the value in the Amount – FoxyReign Jul 23 '15 at 06:41
  • @FoxyReign But can you explain WHY the "Correct Output" for the line B||2013-06-02 11:46||2339000 should be blank? It doesn't seem consistent with the rest of your expected results. – XOR LX Jul 23 '15 at 07:04
  • @XORLX the condition looks at the max value of the timestamp on month level not in hour or minute level. – FoxyReign Jul 23 '15 at 07:08
  • But why then do BOTH of the lines B||2013-05-23 23:04||0 and B||2013-05-30 03:54||1169500 have non-blank expected results? – XOR LX Jul 23 '15 at 08:15
  • @XORLX my mistake. B | 2013-05-23 23:04 should be blank, too. It should just be B | 2013-05-30 3:54 – FoxyReign Jul 24 '15 at 08:47

1 Answers1

0

Assuming cell D2 for the first "Correct Output":

=IF(SUMPRODUCT(0+(MONTH(G$2:G$22)=MONTH(G2)),0+(A$2:A$22=A2))=SUMPRODUCT(0+(MONTH(G$2:G2)=MONTH(G2)),0+(A$2:A2=A2)),LOOKUP(1,0/((MONTH(G$2:G$22)=MONTH(G2))*(A$2:A$22=A2)),C$2:C$22),"")

Copy down as required.

Note that you can amend the upper row being referenced (22 here) to meet your requirements, though you should be careful not to make it too arbitrarily large (and certainly don't reference entire columns!), since, for each additional cell referenced, extra calculation will be required.

And that applies whether those additional cells are technically beyond the last-used cells in those ranges or not. With some functions, e.g. COUNTIF(S)/SUMIF(S), you can get away with referencing entire columns with no detriment to performance, though not with functions which operate over arrays, such as AGGREGATE, SUMPRODUCT, and any construction requiring CSE.

Even better, makes your ranges dynamic, such that they automatically adjust as your data expands/contracts.

Regards

XOR LX
  • 7,632
  • 1
  • 16
  • 15