12

Consider this nested array of dates and names:

var fDates = [
    ['2015-02-03', 'name1'],
    ['2015-02-04', 'nameg'],
    ['2015-02-04', 'name5'],
    ['2015-02-05', 'nameh'],
    ['1929-03-12', 'name4'],
    ['2023-07-01', 'name7'],
    ['2015-02-07', 'name0'],
    ['2015-02-08', 'nameh'],
    ['2015-02-15', 'namex'],
    ['2015-02-09', 'namew'],
    ['1980-12-23', 'name2'],
    ['2015-02-12', 'namen'],
    ['2015-02-13', 'named'],
]

How can I identify those dates that are out of sequence. I don't care if dates repeat, or skip, I just need the ones out of order. Ie, I should get back:

results = [
    ['1929-03-12', 'name4'],
    ['2023-07-01', 'name7'],
    ['2015-02-15', 'namex'],
    ['1980-12-23', 'name2'],
]

('Namex' is less obvious, but it's not in the general order of the list.)

This appears to be a variation on the Longest Increase Subsequence (LIS) problem, with the caveat that there may be repeated dates in the sequence but shouldn't ever step backward.

Use case: I have sorted and dated records and need to find the ones where the dates are "suspicious" -- perhaps input error -- to flag for checking.


NB1: I am using straight Javascript and NOT a framework. (I am in node, but am looking for a package-free solution so I can understand what's going on...)

whackamadoodle3000
  • 6,684
  • 4
  • 27
  • 44
Trees4theForest
  • 1,267
  • 2
  • 18
  • 48
  • Attempts are _always_ worth sharing. You might be on the right path, or on a totally wrong path, but your attempt will be instructive and help inform the way we present an answer. Don't be shy! :) – msanford Jul 25 '17 at 23:10
  • 6
    What's the actual situation here that makes you need to do this? – Ry- Jul 25 '17 at 23:11
  • @Ryan Good point. In my use-case, it is highly unlikely (approaching 0) that there will be two sub-sequences of equal lengths. – Trees4theForest Jul 25 '17 at 23:16
  • Can you update your post with what you have tried for your problem as an programmatic approach? – Mr. Black Jul 25 '17 at 23:22
  • What if there is an element say `['2015-02-03', 'namex'],` after `['2015-02-04', 'name5'],`. Will you include it or exclude it? Cant your logic get simplified like remove dates out of certain range of years or something? – void Feb 23 '18 at 08:16
  • I created [fiddle](https://jsfiddle.net/ali_soltani/ad8b7uhu/13/) for finding Longest Increase Subsequence by helping [this](https://gist.github.com/wheresrhys/4497653) but I don't know you are looking for like this or not? – Ali Soltani Feb 25 '18 at 05:10
  • 3
    Seems odd that people upvote all three answers that fail for this input: `[['2015-01-01'],['2014-01-01'],['2015-01-02'],['2014-01-02'],['2015-01-03'],['2014-01-03'],['2014-01-04'],['2015-01-04'],['2014-01-05'],['2014-01-06'],['2014-01-07'],['2014-01-08'],['2014-01-09'],['2014-01-10'],['2014-01-11']]` – גלעד ברקן Feb 26 '18 at 01:03
  • 2
    This is an anomaly problem, you are finding anomalies from the ordered pattern, I would love to use logistic regression or a decision tree to find the solution but the question states that there are no framework to be used, I have a class but its not super simple. E.g. the decision tree from the question pattern solutions are (if not year 2015 its an anomaly/suspicious or (if not year 2015 and day >= 15 it is an anomaly/suspicious)) – PauAI Feb 26 '18 at 13:25
  • @void Exclude. Dates can increase or repeat in sequence, but never decrease. – Trees4theForest Feb 26 '18 at 16:34
  • @PauAI Great tip for re-framing the question... I'm particularly interested in the LIS approach, but additionally parameterizing the expected values can add robustness. – Trees4theForest Feb 26 '18 at 18:38
  • 1
    @Trees4theForest Without order, chaos ensues. In order to find things out of sequence you must first have the sequence. A simple bubble sort by year, inner bubble sort by month, inner bubble by day. This link will give you some reference to a possible solution array. http://www.stoimen.com/blog/2010/07/09/friday-algorithms-javascript-bubble-sort/. Adding more - I would convert date to epoch and check epoch string sequence if you dislike bubble sorting by year,month,day. –  Feb 28 '18 at 20:13
  • 1
    Clone and sort your original array based on the time-field and then compare the location of each in your sorted array to your original. When the first one "out of order" is found, remove it from the original array, mark it and repeat all from the beginning (now with the first odd entry removed). You can save the location of the odd entry into 3rd array so that you know which ones in the original were "failing". (This location should, of course, be based on the original array.) – Stacking For Heap Feb 28 '18 at 21:56
  • Seems odd that people upvote answers which fail for this input: `[['1975-01-01'], ['2015-02-03'], ['2015-02-04'], ['2015-02-04'], ['2015-02-05'], ['1929-03-12'], ['2023-07-01'], ['2015-02-07'], ['2015-02-08']]`. Actually all the answers fail. A range of valid dates could be set to filter out "too-fars" before looking for LIS. But problem of several valid sequences is not solved(able). E. g. `[ ['2015-02-01'], ['2015-02-02'], ['2015-02-05'], ['2015-02-06'], ['1929-03-03'], ['2023-07-04']]` – Kosh Mar 01 '18 at 00:05
  • 1
    @KoshVery, which result do you expect with the first input (you may have a look to the result of `test2` in my answer.) – Nina Scholz Mar 01 '18 at 09:38
  • @NinaScholz, I expect `['1975-01-01']` appeared in `results`. – Kosh Mar 01 '18 at 16:16
  • @KoshVery, but why? this value is perfectly in sequence. – Nina Scholz Mar 01 '18 at 16:23
  • @NinaScholz, because it's "suspicious". Unfortunately OP did not provide a strict definition of "suspicious" and misleaded people with LIS approach. You solved the LIS problem well but the **real** problem (anomaly problem) is not solved and cannot be solved without a "suspicious" definition. So I'd not recommend any of the answers here to prod. – Kosh Mar 01 '18 at 16:41

7 Answers7

5

Here's an adaptation of Rosetta Code LIS to take a custom getElement and compare functions. We can refine the comparison and element-get functions based on your specific needs.

function f(arr, getElement, compare){
  function findIndex(input){
    var len = input.length;
    var maxSeqEndingHere = new Array(len).fill(1)
    for(var i=0; i<len; i++)
      for(var j=i-1;j>=0;j--)
        if(compare(getElement(input, i), getElement(input, j)) && maxSeqEndingHere[j] >= maxSeqEndingHere[i])
          maxSeqEndingHere[i] = maxSeqEndingHere[j]+1;
    return maxSeqEndingHere;
  }

  function findSequence(input, result){
    var maxValue = Math.max.apply(null, result);
    var maxIndex = result.indexOf(Math.max.apply(Math, result));
    var output = new Set();
    output.add(maxIndex);
    for(var i = maxIndex ; i >= 0; i--){
      if(maxValue==0)break;
      if(compare(getElement(input, maxIndex), getElement(input, i))  && result[i] == maxValue-1){
        output.add(i);
        maxValue--;
      }
    }

    return output;
  }

  var result = findIndex(arr);
  var final = findSequence(arr, result)
  return arr.filter((e, i) => !final.has(i));
}

var fDates = [
    ['2015-02-03', 'name1'],
    ['2015-02-04', 'nameg'],
    ['2015-02-04', 'name5'],
    ['2015-02-05', 'nameh'],
    ['1929-03-12', 'name4'],
    ['2023-07-01', 'name7'],
    ['2015-02-07', 'name0'],
    ['2015-02-08', 'nameh'],
    ['2015-02-15', 'namex'],
    ['2015-02-09', 'namew'],
    ['1980-12-23', 'name2'],
    ['2015-02-12', 'namen'],
    ['2015-02-13', 'named'],
];

console.log(f(fDates, (arr, i) => arr[i][0], (a,b) => a >= b));
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
5

This solution tries to get all valid sequences and returns the longes sequences for filtering the parts out.

It works by iterating the given array and checks if the values could build a sequence. If a value is given, which part result has a valid predecessor, the array is appended with this value. If not a backtracking is made and a sequence is searched with a valid predecessor.

act.   array
value   7  3  4  4  5  1 23  7   comment
-----  ------------------------  ---------------------------
   7    7                        add array with single value

   3    7                        keep
           3                     add array with single value

   4    7                        keep
           3  4                  add value to array

   4    7                        keep
           3  4  4               add value to array

   5    7                        keep
           3  4  4  5            add value to array

   1    7                        keep
           3  4  4  5            keep
                       1         add array with single value

  23    7                23      add value to array
           3  4  4  5    23      add value to array
                       1 23      add value to array

   7    7                23      keep
        7                    7   fork above, filter for smaller or equal and add value
           3  4  4  5    23      keep
           3  4  4  5        7   fork above, filter for smaller or equal and add value
                       1 23      keep
                       1     7   fork above, filter for smaller or equal and add value

function longestSequences(array, getValue = v => v) {
    return array
        .reduce(function (sub, value) {
            var single = true;

            sub.forEach(function (s) {
                var temp;

                if (getValue(s[s.length - 1]) <= getValue(value)) {
                    s.push(value);
                    single = false;
                    return;
                }

                // backtracking
                temp = s.reduceRight(function (r, v) {
                    if (getValue(v) <= getValue(r[0])) {
                        r.unshift(v);
                        single = false;
                    }
                    return r;
                }, [value]);

                if (temp.length !== 1 && !sub.some(s => s.length === temp.length && s.every((v, i) => getValue(v) === getValue(temp[i])))) {
                    sub.push(temp);
                }
            });

            if (single) {
                sub.push([value]);
            }
            return sub;
        }, [])
        .reduce(function (r, a) {
            if (!r || r[0].length < a.length) {
                return [a];
            }
            if (r[0].length === a.length) {
                r.push(a);
            }
            return r;
        }, undefined);
}

function notInSequence(array, getValue = v => v) {
    var longest = longestSequences(array, getValue);
    return array.filter((i => a => a !== longest[0][i] || !++i)(0));
}

var array = [7, 3, 4, 4, 5, 1, 23, 7, 8, 15, 9, 2, 12, 13],
    fDates = [['2015-02-03', 'name1'], ['2015-02-04', 'nameg'], ['2015-02-04', 'name5'], ['2015-02-05', 'nameh'], ['1929-03-12', 'name4'], ['2023-07-01', 'name7'], ['2015-02-07', 'name0'], ['2015-02-08', 'nameh'], ['2015-02-15', 'namex'], ['2015-02-09', 'namew'], ['1980-12-23', 'name2'], ['2015-02-12', 'namen'], ['2015-02-13', 'named']],
    usuallyFailingButNotHere = [['2015-01-01'], ['2014-01-01'], ['2015-01-02'], ['2014-01-02'], ['2015-01-03'], ['2014-01-03'], ['2014-01-04'], ['2015-01-04'], ['2014-01-05'], ['2014-01-06'], ['2014-01-07'], ['2014-01-08'], ['2014-01-09'], ['2014-01-10'], ['2014-01-11']],
    test2 = [['1975-01-01'], ['2015-02-03'], ['2015-02-04'], ['2015-02-04'], ['2015-02-05'], ['1929-03-12'], ['2023-07-01'], ['2015-02-07'], ['2015-02-08']];

console.log(longestSequences(array));
console.log(notInSequence(array));

console.log(notInSequence(fDates, a => a[0]));

console.log(longestSequences(usuallyFailingButNotHere, a => a[0]));
console.log(notInSequence(usuallyFailingButNotHere, a => a[0]));

console.log(longestSequences(test2, a => a[0]));
console.log(notInSequence(test2, a => a[0]));
.as-console-wrapper { max-height: 100% !important; top: 0; }
Nina Scholz
  • 376,160
  • 25
  • 347
  • 392
  • 1
    How do we know which one of these sequences are correct? Your code indicates that the longer sequence is the one out of place. Mine (Rosetta's) indicates the shorter sequence is out of place: `[['2015-01-01'],['2014-01-01'],['2015-01-02'],['2014-01-02'],['2015-01-03'],['2014-01-03'],['2014-01-04'],['2015-01-04'],['2014-01-05'],['2014-01-06'],['2014-01-07']]` – גלעד ברקן Feb 26 '18 at 00:57
  • @גלעדברקן, i changed the algorithm to a more reliable one. your mentioned array works now, as it looks like. – Nina Scholz Feb 26 '18 at 16:17
  • (I wonder if LIS is what OP wants, though, ha - maybe there are other constraints) – גלעד ברקן Feb 26 '18 at 16:26
  • @גלעדברקן LIS seemed to be the best way to define the problem -- but assumes that majority of the data is correct (big assumption) and thus the LIS would represent the body of correct data. If there is equal amounts of bad data as good (increasing) data then this could be problematic -- but the chance of that is small. – Trees4theForest Feb 26 '18 at 16:39
  • @Trees4theForest if there are other determinants of "good" data, please let us know. – גלעד ברקן Feb 26 '18 at 16:43
  • maybe if it does start with good data ... my old solution would work ... – Nina Scholz Feb 26 '18 at 16:46
  • @NinaScholz I have encountered cases where first data point is "bad" but still in the LIS: ([1925-01-23],[2007-01-02],[2007-01-03],[2007-01-03]) but I have other checks elsewhere that catch that. Thus focusing on the LIS seems sufficient. – Trees4theForest Feb 26 '18 at 16:50
  • 1
    you could use an anchor date (lower border) at the beginning of the array, to filter that value out. – Nina Scholz Feb 26 '18 at 16:52
3

This solution uses the function reduce and keeps the previously accepted date to make the necessary comparisons.

var fDates = [['2015-02-03', 'name1'], ['2015-02-04', 'nameg'], ['2015-02-04', 'name5'], ['2015-02-05', 'nameh'], ['1929-03-12', 'name4'], ['2023-07-01', 'name7'], ['2015-02-07', 'name0'], ['2015-02-08', 'nameh'], ['2015-02-15', 'namex'], ['2015-02-09', 'namew'], ['1980-12-23', 'name2'], ['2015-02-12', 'namen'], ['2015-02-13', 'named']],
results = fDates.reduce((acc, c, i, arr) => {
  /*
   * This function finds a potential valid sequence.
   * Basically, will check if any next valid sequence is
   * ahead from the passed controlDate.
   */
  function sequenceAhead(controlDate) {
    for (var j = i + 1; j < arr.length; j++) {
      let [dt] = arr[j];
      //The controlDate is invalid because at least a forward date is in conflict with its sequence.
      if (dt > acc.previous && dt < controlDate) return true; 
    }
    
    //The controlDate is valid because forward dates don't conflict with its sequence.
    return false; 
  }
  
  let [date] = c; //Current date in this iteration.
  if (i > 0) { // If this is not the first iteration
    if (date === acc.previous) return acc; // Same as previous date are skipped.
    // If the current date is lesser than previous then is out of sequence.
    // Or if there is at least valid sequence ahead.
    if (date < acc.previous || sequenceAhead(date)) acc.results.push(c); 
    else acc.previous = date; // Else, this current date is in sequence.
  } 
  else acc.previous = date; // Else, set the first date.

  return acc;
}, { 'results': [] }).results;

console.log(results);
.as-console-wrapper { max-height: 100% !important; top: 0; }
Ele
  • 33,468
  • 7
  • 37
  • 75
  • How do we know which one of these sequences are correct? Your code indicates that the longer sequence is the one out of place. Mine (Rosetta's) indicates the shorter sequence is out of place: `[['2015-01-01'],['2014-01-01'],['2015-01-02'],['2014-01-02'],['2015-01-03'],['2014-01-03'],['2014-01-04'],['2015-01-04'],['2014-01-05'],['2014-01-06'],['2014-01-07']]` – גלעד ברקן Feb 26 '18 at 00:59
2

All of previous answers focus on JavaScript and maybe they won't work correctly. So I decided to add new answer that focused on Algorithm.

As @Trees4theForest mentioned in his question and comments, he is looking for a solution for Longest Increase Subsequence and out of order dates are dates that aren't in Longest Increase Subsequence (LIS) set.

I used this method like below. In algorithm's point of view, it's true.

function longestIncreasingSequence(arr, strict) {

    var index = 0,
        indexWalker,
        longestIncreasingSequence,
        i,
        il,
        j;

    // start by putting a reference to the first entry of the array in the sequence
    indexWalker = [index];

    // Then walk through the array using the following methodolgy to find the index of the final term in the longestIncreasing and
    // a sequence (which may need altering later) which probably, roughly increases towards it - http://en.wikipedia.org/wiki/Longest_increasing_subsequence#Efficient_algorithms
    for (i = 1, il = arr.length; i < il; i++) {

        if (arr[i] < arr[indexWalker[index]]) {

            // if the value is smaller than the last value referenced in the walker put it in place of the first item larger than it in the walker
            for (j = 0; j <= index; j++) {

                // As well as being smaller than the stored value we must either 
                // - be checking against the first entry
                // - not be in strict mode, so equality is ok
                // - be larger than the previous entry
                if (arr[i] < arr[indexWalker[j]] && (!strict || !j || arr[i] > arr[indexWalker[j - 1]])) {
                    indexWalker[j] = i;
                    break;
                }
            }

            // If the value is greater than [or equal when not in strict mode) as the last in the walker append to the walker
        } else if (arr[i] > arr[indexWalker[index]] || (arr[i] === arr[indexWalker[index]] && !strict)) {
            indexWalker[++index] = i;
        }

    }

    // Create an empty array to store the sequence and write the final term in the sequence to it
    longestIncreasingSequence = new Array(index + 1);
    longestIncreasingSequence[index] = arr[indexWalker[index]];


    // Work backwards through the provisional indexes stored in indexWalker checking for consistency
    for (i = index - 1; i >= 0; i--) {

        // If the index stored is smaller than the last one it's valid to use its corresponding value in the sequence... so we do  
        if (indexWalker[i] < indexWalker[i + 1]) {
            longestIncreasingSequence[i] = arr[indexWalker[i]];

            // Otherwise we need to work backwards from the last entry in the sequence and find a value smaller than the last entry 
            // but bigger than the value at i (this must be possible because of the way we constructed the indexWalker array)
        } else {
            for (j = indexWalker[i + 1] - 1; j >= 0; j--) {
                if ((strict && arr[j] > arr[indexWalker[i]] && arr[j] < arr[indexWalker[i + 1]]) ||
                    (!strict && arr[j] >= arr[indexWalker[i]] && arr[j] <= arr[indexWalker[i + 1]])) {
                    longestIncreasingSequence[i] = arr[j];
                    indexWalker[i] = j;
                    break;
                }
            }
        }
    }

    return longestIncreasingSequence;
}

With method above, we can find dates that is out of order like below:

// Finding Longest Increase Subsequence (LIS) set
var _longestIncreasingSequence = longestIncreasingSequence(fDates.map(([date]) => date)); 

// Out of order dates
var result = fDates.filter(([date]) => !_longestIncreasingSequence.includes(date)); 

Online demo(jsFiddle)

Ali Soltani
  • 9,589
  • 5
  • 30
  • 55
  • "All of previous answers focus on JavaScript and maybe they won't work correctly." Not correct - my answer (and now also Nina's) uses LIS – גלעד ברקן Feb 26 '18 at 16:03
2

here is a simple self- explanatory solution. hope it will help you.

const findOutOfSequenceDates = items => {
    items = items.map(d => d);

    const sequence = [], outOfsequence = [];
    sequence.push(items.shift());

    const last = ind => sequence[sequence.length - ind][0];

    items.forEach(item => {
        const current = new Date(item[0]);

        if (current >= new Date(last(1))) {
            sequence.push(item);
        } else if (current >= new Date(last(2))) {
            outOfsequence.push(sequence.pop());
            sequence.push(item);
        } else {
            outOfsequence.push(item);
        }
    });

    return outOfsequence;
};

var fDates = [
    ['2015-02-03', 'name1'],
    ['2015-02-04', 'nameg'],
    ['2015-02-04', 'name5'],
    ['2015-02-05', 'nameh'],
    ['1929-03-12', 'name4'],
    ['2023-07-01', 'name7'],
    ['2015-02-07', 'name0'],
    ['2015-02-08', 'nameh'],
    ['2015-02-15', 'namex'],
    ['2015-02-09', 'namew'],
    ['1980-12-23', 'name2'],
    ['2015-02-12', 'namen'],
    ['2015-02-13', 'named'],
];
console.log(findOutOfSequenceDates(fDates));
Sufian Saory
  • 862
  • 9
  • 14
1

Use the Javascript Date type. Compare with those objects. Very simplistically,

date1 = new Date(fDates[i, 0])
date2 = new Date(fDates[i+1, 0])
if (date2 < date1) {    // or whatever comparison you want ...
    // flag / print / alert the date
}

To clarify, This merely finds items out of sequence. You can do that with strings, as Jaromanda X pointed out. However, you use the phrase "way out of line"; whatever this means for you, Date should give you the ability to determine and test for it. For instance, is '2023-07-01' unacceptable because it's 8 years away, or simply because it's out of order with the 2015 dates? You might want some comparison to a simpler time span, such as one month, where your comparison will looks something like

if (date2-date1 > one_month)
Prune
  • 76,765
  • 14
  • 60
  • 81
1

Summary of your question If I have understood your question correctly, you are trying to identify array entries that do not follow a chronological order based on the time/date property value.

Solution Convert the date string / time into a UNIX time stamp (number of seconds lapsed since 01/jan/1970 at 00:00:00)

Using a loop, we can store the value against a previous reading per itenary, if the value is negative, this would indicate an error in the date lapse, if the value is positive, it would indicate the order is valid

When negative, we can create an array to denote the position of the reference array and its values allowing you to go back to the original array and review the data.

Example Code

var arrData = [
    {date: '2015-02-03', value:'name1'},
    {date: '2015-02-04', value:'nameg'},
    {date: '2015-02-04', value:'name5'},
    {date: '2015-02-05', value:'nameh'},
    {date: '1929-03-12', value:'name4'},
    {date: '2023-07-01', value:'name7'},
    {date: '2015-02-07', value:'name0'},
    {date: '2015-02-08', value:'nameh'},
    {date: '2015-02-15', value:'namex'},
    {date: '2015-02-09', value:'namew'},
    {date: '1980-12-23', value:'name2'},
    {date: '2015-02-12', value:'namen'},
    {date: '2015-02-13', value:'named'}
];

var arrSeqErrors = [];
function funTestDates(){
  var intLastValue = 0, intUnixDate =0;
  for (x = 0; x <= arrData.length-1; x++){
    intUnixDate = Date.parse(arrData[x].date)/1000;
    var intResult = intUnixDate - intLastValue;
    if (intResult < 0){
      console.log("initeneration: " + x + " is out of sequence");
      arrSeqErrors.push (arrData[x]);
    }
    intLastValue = intResult;
  }
  console.log("Items out of sequence are:");
  console.log(arrSeqErrors);
}

funTestDates();
DataCure
  • 425
  • 2
  • 15
  • You will notice I have given the array labels such as Date and Value in order for this short bit of code to work, I feel this can easily be inserted into your results if need be – DataCure Feb 26 '18 at 17:50
  • Simple approach - but maybe too simplistic? Since it only considers the next number's relation to the previous it incorrectly passes 2023-07-01 and flags 2015-02-07. – Trees4theForest Feb 26 '18 at 18:36
  • Are you trying to find a different sequence and if so what type of sequence. In terms of the values are stored, you can add a -1 on the X when you push the value to the arrSeqErrors array – DataCure Feb 26 '18 at 18:40