0

Below is my js object array.

const objArray = [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00'
      status: 'completed'
    },
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00'
      status: 'pending'
    },
    {
      file: 'file_1',
      start_time: '2021-08-14 03:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-14 03:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00'
      status: 'pending'
    }
]

From above array, I need to filter the objects based on the start time field. If the start time are same they should be grouped as a sub array. Also within the sub array there can't be objects with same file name. Ex, In above array, if you compare objects 1&2 with 4&5, each of them have their own start time values, but their file names are same. Therefore I need only one set from them ie 1&2 which has the lowest timestamp. So the final output array should be as below,

[
  [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00'
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00'
      status: 'completed'
    }
  ],
  [
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00'
      status: 'pending'
    }
  ],
  [
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00'
      status: 'pending'
    }
  ]
]

I tried implement it by looping through every object from the initial array. But what the quickest way to achieve this?

ahkam
  • 607
  • 7
  • 24
  • 3
    There is no way that doesn’t loop through every object in the initial array. Let’s see your implementation. – James Aug 13 '21 at 16:20
  • Does this answer your question? [Most efficient method to groupby on an array of objects](https://stackoverflow.com/questions/14446511/most-efficient-method-to-groupby-on-an-array-of-objects) – James Aug 13 '21 at 16:22
  • Not really. I need to group them as sub arrays based on the start time – ahkam Aug 13 '21 at 16:26
  • The output you seem to want is the result of two unrelated operations: (1) filter the most ancient item associated with each file (2) group the remaining items by date. But your description is so confusing the expected result has to be guessed from the sample output. Besides, it's not really about grouping objects so the wording of your question is misleading. I vote for closing it. – kuroi neko Aug 14 '21 at 09:46

2 Answers2

2

Using XSLT 3 as provided by Saxon-JS 2 (https://www.saxonica.com/saxon-js/index.xml) you can group JSON data:

const objArray = [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00',
      status: 'completed'
    },
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00',
      status: 'pending'
    },
    {
      file: 'file_1',
      start_time: '2021-08-14 03:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-14 03:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00',
      status: 'pending'
    }
];

const xslt = `<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">

  <xsl:output method="json" indent="yes"/>

  <xsl:template match="." name="xsl:initial-template">
    <xsl:variable name="groups" as="array(*)*">
      <xsl:for-each-group select="?*" group-by="?start_time">
          <xsl:sequence select="array { current-group() }"/>
      </xsl:for-each-group>           
    </xsl:variable>
    <xsl:variable name="filtered-groups" as="array(*)*">
      <xsl:for-each-group select="$groups" composite="yes" group-by="sort(?*?file)">
        <xsl:sort select="?1?start_time"/>
        <xsl:sequence select="."/>
      </xsl:for-each-group>
    </xsl:variable>
    <xsl:sequence select="array { $filtered-groups }"/>
  </xsl:template>
  
</xsl:stylesheet>`;

const resultArray = SaxonJS.XPath.evaluate(`transform(
  map {
    'stylesheet-text' : $xslt,
    'initial-match-selection' : $json,
    'delivery-format' : 'raw'
  }
)?output`, [], { params : { xslt : xslt, json : [objArray] } });

console.log(resultArray);
<script src="https://martin-honnen.github.io/xslt3fiddle/js/SaxonJS2.js"></script>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • 1
    This is in no way an answer to the question. I see no "XSLT" tag in it. It's about as relevant as suggesting using PHP to let the server do the job. – kuroi neko Aug 14 '21 at 09:50
  • 1
    @kuroineko, Saxon-JS 2 is a JavaScript library that runs both client and server-side. No idea why you compare it to PHP. If I had used JQuery without a tag mentioning it, would you consider any answer using that library to also be "no" answer? – Martin Honnen Aug 14 '21 at 11:05
  • Appreciate your answer Martin. Is there a way to solve this with ES6 or simple js code? – ahkam Aug 14 '21 at 13:10
  • 1
    I can't see how dumping a truckload of code in a totally different language meant to handle XML, even though wrapped in a JavaScript interface, can be of any help to an obvious beginner asking for a rather trivial algorithm that can be coded in half a dozen lines of circa 1995 vanilla JavaScript. – kuroi neko Aug 14 '21 at 13:47
  • 1
    The actual number of lines is fairly irrelevant. It was just a manner of speaking. The point is, this is very doable in a reasonable amount of vanilla JavaScript, and I can hardly picture anyone learning a new language and including a 500K library just to filter a couple of objects. Besides, the main issue with this question is the algorithm itself, which seems kind of fishy to me. I took a moment to figure what it was supposed to achieve and couldn't grasp any logic behind it, so I finally decided to ask a few questions to the OP about it. – kuroi neko Aug 14 '21 at 19:47
2

Anyone looking for a way to do an equivalent of SQL GroupBy in JavaScript will not find an answer here.

This question is about a very specific algorithm that performs two steps:

  1. group the records by date in sub arrays
  2. consider two sub arrays equivalent if their records are related to the exact same set of files.
    Filter equivalent sub arrays to retain only the one with the oldest date

So let's do that with some vanilla JavaScript:

function do_some_filtering (records) {

    // create sets containing events grouped by date and index them by date
    let sets = {};
    for (let record of records) {
        // dates are converted to milliseconds since the Epoch for comparison
        let date = Date.parse(record.start_time);
        if (!sets[date]) sets[date] = [record]; else sets[date].push(record);
    }
    
    // filter "unique" sets based on the list of files present in each set
    let unique_date = {};
    for (let date in sets) {
        let signature = sets[date]       // this will concatenate all file names
                       .map(x => x.file) // to create a unique signature for
                       .sort()           // potentially deletable groups
                       .reduce((file,signature) => file+":"+signature)
        // if multiple sets have the same signature, keep the one with the lowest date
        if (!unique_date[signature] || unique_date[signature] > date) {
            unique_date[signature] = date;
        }
    }
    
    // collect "unique" sets
    let result = [];
    for (let signature in unique_date) result.push(sets[unique_date[signature]]);   
    return result;
}

const objArray = [
    {
      file: 'file_1',
      start_time: '2021-08-12 14:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-12 14:00:00',
      status: 'completed'
    },
    {
      file: 'file_3',
      start_time: '2021-08-14 15:00:00',
      status: 'pending'
    },
    {
      file: 'file_1',
      start_time: '2021-08-14 03:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-08-14 03:00:00',
      status: 'pending'
    },
    {
      file: 'file_2',
      start_time: '2021-11-11 11:11:00',
      status: 'pending'
    }
]

let result = do_some_filtering (objArray);
console.log(JSON.stringify(result, null, "    "));

The code relies heavily on the capability of objects to act as associative arrays (ancestors of a modern Map, with some limitations), a feature that seems to have fallen into disuse with the advent of Immutability and Functional Programming, but still proves to be quite useful at times.

Speaking of immutability, the records are not duplicated, i.e. mutating one in the input will be reflected in the output and vice versa. Since the records themselves are left untouched, this will still achieve the kind of pseudo-immutability JavaScript can offer without dedicated libraries. If you absolutely want a duplication, let me know and I'll update the code.

I'm not sure what you mean by the "quickest way" to do it. If you want to compare sets based on the files they contain, you'll have to pay the price of comparing two lists, which, as far as I know, is at least O(N log N) if you use a sort, or O(N²) if you do a pairwise compare. But unless you plan on using this on hundreds of files, the number of groups containing more than a few files should be fairly small and you should hardly feel the difference.
The rest is O(N) and I very doubt you can achieve anything without looping over all your records at least once.

If by "quick" you mean "quick to write", I guess 15 lines of vanilla JavaScript should fit the bill?

kuroi neko
  • 8,479
  • 1
  • 19
  • 43
  • Sorry if the question is not clear. These objects are user requested files. same start_time represent single request made by the user. So they should be grouped as sub arrays. And if there are multiple sub arrays with same start_name, i want to render only 1 sub array which is having the oldest start_time. But there can be 2 sub arrays where one is a subset of another in the file name. they will be considered as 2 different requests. I hope u get it now – ahkam Aug 15 '21 at 10:53
  • let me know, if ur solution need to be updated based on my comment above. – ahkam Aug 15 '21 at 10:59
  • give me an update to approve this answer. – ahkam Aug 15 '21 at 11:49