7

TL;DR; To make it simple, how can I filter multiple properties of a parent child array which could be several tree level deep. This is for an Open Source datagrid lib used by several hundred users.

So I have an array of parent/children references, the children can also have children themselves and so on, there's no limit on the tree level depth. Also, I need to be able to filter from not just the property that has the tree structure but on any propertie(s), that are columns in a grid, of that array.

For example, I have this array, which is representing a file explorer list

const myFiles = [
    {id: 11, file: "Music", parentId: null },
    {id: 12, file: "mp3", parentId: 11 },
    {id: 14, file: "pop", parentId: 12 },
    {id: 15, file: "theme.mp3", dateModified: "2015-03-01", size: 85, parentId: 14, },
    {id: 16, file: "rock", parentId: 12 },
    {id: 17, file: "soft.mp3", dateModified: "2015-05-13", size: 98, parentId: 16, },
    {id: 18, file: "else.txt", dateModified: "2015-03-03", size: 90, parentId: null, },
    {id: 21, file: "Documents", parentId: null, },
    {id: 2, file: "txt", parentId: 21 },
    {id: 3, file: "todo.txt", dateModified: "2015-05-12", size: 0.7, parentId: 2, },
    {id: 4, file: "pdf", parentId: 21 },
    {id: 22, file: "map2.pdf", dateModified: "2015-05-21", size: 2.9, parentId: 4 },
    {id: 5, file: "map.pdf", dateModified: "2015-05-21", size: 3.1, parentId: 4, },
    {id: 6, file: "internet-bill.pdf", dateModified: "2015-05-12", size: 1.4, parentId: 4, },
    {id: 7, file: "xls", parentId: 21 },
    {id: 8, file: "compilation.xls", dateModified: "2014-10-02", size: 2.3, parentId: 7, },
    {id: 9, file: "misc", parentId: 21 },
    {id: 10, file: "something.txt", dateModified: "2015-02-26", size: 0.4, parentId: 9, },
]

The array looks flat but in reality, it's a tree view structure that is represented in a datagrid as shown below.

What I found that partially works is to loop through the entire array and add the full list of files that each item can have including itself, for example, if Documents has a child PDF which has itself has a child Map.pdf, then the tree mapping can be represented by ["Documents", "PDF", "map.pdf"] and we store that on the parent object, then on the next child we store ["PDF", "map.pdf"] and finally on the last child we store ["map.pdf"] like so

    {id: 21, file: "Documents", parentId: null, treeMap: ["Documents", "PDF", "map.pdf"] }
    {id: 4, file: "pdf", parentId: 21, treeMap: ["PDF", "map.pdf"] }
    {id: 5, file: "map.pdf", dateModified: "2015-05-21", size: 3.1, parentId: 4, treeMap: ["map.pdf"] }

and this is the method allowing me to do that

export function modifyDatasetToAddTreeMapping(items: any[], treeViewColumn: Column, dataView: any) {
  for (let i = 0; i < items.length; i++) {
    items[i]['treeMap'] = [items[i][treeViewColumn.id]];
    let item = items[i];

    if (item['parentId'] !== null) {
      let parent = dataView.getItemById(item['parentId']);

      while (parent) {
        parent['treeMap'] = dedupePrimitiveArray(parent['treeMap'].concat(item['treeMap']));
        item = parent;
        parent = dataView.getItemById(item['parentId']);
      }
    }
  }
}

export function dedupePrimitiveArray(inputArray: Array<number | string>): Array<number | string> {
  const seen = {};
  const out = [];
  const len = inputArray.length;
  let j = 0;
  for (let i = 0; i < len; i++) {
    const item = inputArray[i];
    if (seen[item] !== 1) {
      seen[item] = 1;
      out[j++] = item;
    }
  }
  return out;
}

Then the datagrid lib uses the Filter method which I can use this way, where columnFilters is an object containing 1 or more filter, for example const columnFilters = { file: 'map', size: '>3' }

The datagrid is a lib (SlickGrid) and it uses the filter method like so dataView.setFilter(treeFilter);

function treeFilter(dataView: any, item: any) {
    const columnFilters = { file: this.searchString.toLowerCase(), size: 2 };
    let filterCount = 0;

    if (item[parentPropName] !== null) {
      let parent = dataView.getItemById(item['parentId']);
      while (parent) {
        if (parent.__collapsed) {
          return false;
        }
        parent = dataView.getItemById(parent['parentId']);
      }
    }

    for (const columnId in columnFilters) {
      if (columnId !== undefined && columnFilters[columnId] !== '') {
        filterCount++;

        if (item.treeMap === undefined || !item.treeMap.find((itm: string) => itm.endsWith(columnFilters[columnId]))) {
          return false;
        }
      }
    }
    return true;
  }

With the call of modifyDatasetToAddTreeMapping() it works ok if I want to filter on the File column, but if I add more column filters, it doesn't work as intended. For example if you take a look at the 2nd print screen, you see that I entered "map" and that will display the "Documents > PDF > map.pdf" and that is great but if add a file size lower than 3Mb it shouldn't display "map.pdf" and because that file is not shown and "Documents > PDF" don't contain the word "map" then nothing should display, so as you can see the filter is not behaving as it should.

So with the current implementation, I have 2 problems 1. not behaving correctly when a tree node is not displayed, the parent of it shouldn't be displayed 2. having to call modifyDatasetToAddTreeMapping() is an extra call that might not be needed 3. it also modifies the source array, I could deep clone the array but that would be yet another expense in performance

This might be doable by recursion, after converting to a hierarchical structure (tree), but I can't figure out the best algorithm to do this if it's with recursion, isn't it expensive to always drill down the tree to find items?

Lastly, the intention is to use this with SlickGrid which might have 10k or even 50k rows so it has to be fast. You can see this SlickGrid demo but their implementation of the filtering is not correct, also I found the method add the mapping in this other SO Answer

NOTE: I would also like to point out that a resolution to this issue will benefit possibly a few hundred (or thousand) users since it is to be implement in Angular-Slickgrid and Aurelia-Slickgrid which are both Open Source lib and used by at least 300+ users.

enter image description here

Filtering with the word "map" should not return anything here since none of the nodes/children have that text. enter image description here

EDIT

The best code would be to plug whatever code does the job into a regular JS filter, so that means the end solution would be a method myFilter that would be a filter callback method. The reason I'm stuck with this is because I use an external lib SlickGrid and I have to use what that lib has available as public methods exposed.

function myFilter(item, args) {
  const columnFilters = args.columnFilters;

  // iterate through each items of the dataset
  // return true/false on each item
}

// to be used as a drop in
dataView.setFilterArgs({ columnFilters: this._columnFilters });
dataView.setFilter(myFilter.bind(this));

If I have const columnFilters = { file: "map", size: "<3.2" }; , the expected result of the array would be 4 lines

// result
[
  {id: 21, file: "Documents", parentId: null },
  {id: 4, file: "pdf", parentId: 21, },
  {id: 22, file: "map2.pdf", dateModified: "2015-05-21", size: 2.9, parentId: 4 },
  {id: 5, file: "map.pdf", dateModified: "2015-05-21", size: 3.1, parentId: 4, }
]

If I have const columnFilters = { file: "map", size: "<3" }; , the expected result of the array would be 3 lines

// result
[
  {id: 21, file: "Documents", parentId: null },
  {id: 4, file: "pdf", parentId: 21, },
  {id: 22, file: "map2.pdf", dateModified: "2015-05-21", size: 2.9, parentId: 4 },
]

and finally if I have const columnFilters = { file: "map", size: ">3" }; then the expected result would be an empty array because none of the file have that char and file size conditions.

EDIT 2

From @AlexL's answer, it's starting to work. Just a couple of tweaking, but it looks very promising already enter image description here

EDIT 3

Thanks to Alex awesome work, his answer helped me merge this into my Open Source lib. I now have 2 live demos with Parent/Child ref (flat dataset) and with a Hierarchical Dataset (tree dataset). I wish I could upvote more than once :)

ghiscoding
  • 12,308
  • 6
  • 69
  • 112
  • What if you store your data in a complex entity with a tree structure, but passing to the presenter (the grid itself) to render a flat version of your core entity? You could apply your filter functions to the complex entity (which only modifies the flat object; binding by ID, for example) or directly to the flatten, if needed. – VRoxa Apr 05 '20 at 02:19
  • yes the datagrid lib requires a flat dataset, I can pass from hierarchical to flat if I need to (and I have the methods to do it) but still I can't figure out the best algorithm to do the filter, that is mainly what I'm looking for in this SO... and of course, good performance since the database might be big, it's a public lib and lot of users come out with all kind of use case much bigger than mine, in my use case it could go around 5k rows – ghiscoding Apr 05 '20 at 03:14

1 Answers1

5

I have a way to do it. It should be fairly performant but we might also want to swap out map and reduce etc. for good old for-loops to optimize speed further (I have seen various blogs and articles comparing speed of forEach, map etc. to for-loop and for-loops seem to win)

Here's a demo (also here: https://codepen.io/Alexander9111/pen/abvojzN):

const myFiles = [
  { id: 11, file: "Music", parentId: null },
  { id: 12, file: "mp3", parentId: 11 },
  { id: 14, file: "pop", parentId: 12 },
  { id: 15, file: "theme.mp3", dateModified: "2015-03-01", size: 85,  parentId: 14 },
  { id: 16, file: "rock", parentId: 12 },
  { id: 17, file: "soft.mp3", dateModified: "2015-05-13", size: 98, parentId: 16 },
  { id: 18, file: "else.txt", dateModified: "2015-03-03", size: 90, parentId: null },
  { id: 21, file: "Documents", parentId: null },
  { id: 2, file: "txt", parentId: 21 },
  { id: 3, file: "todo.txt", dateModified: "2015-05-12", size: 0.7, parentId: 2 },
  { id: 4, file: "pdf", parentId: 21 },
  { id: 22, file: "map2.pdf", dateModified: "2015-05-21", size: 2.9, parentId: 4 },
  { id: 5, file: "map.pdf", dateModified: "2015-05-21", size: 3.1, parentId: 4 },
  { id: 6, file: "internet-bill.pdf", dateModified: "2015-05-12", size: 1.4, parentId: 4 },
  { id: 7, file: "xls", parentId: 21 },
  { id: 8, file: "compilation.xls", dateModified: "2014-10-02", size: 2.3, parentId: 7 },
  { id: 9, file: "misc", parentId: 21 },
  { id: 10,  file: "something.txt", dateModified: "2015-02-26", size: 0.4,  parentId: 9 }
];

//example how to use the "<3" string - better way than using eval():
const columnFilters = { file: "map", size: "<3.2" }; //, size: "<3" 
const isSizeValid = Function("return " + myFiles[11].size + "<3")();
//console.log(isSizeValid);

const myObj = myFiles.reduce((aggObj, child) => {
  aggObj[child.id] = child;
  //the filtered data is used again as each subsequent letter is typed
  //we need to delete the ._used property, otherwise the logic below
  //in the while loop (which checks for parents) doesn't work:
  delete aggObj[child.id]._used;
  return aggObj;
}, {});

function filterMyFiles(myArray, columnFilters){
  const filteredChildren = myArray.filter(a => {
    for (let key in columnFilters){
      //console.log(key)      
      if (a.hasOwnProperty(key)){
        const strContains =  String(a[key]).includes(columnFilters[key]);
        const re = /(?:(?:^|[-+<>=_*/])(?:\s*-?\d+(\.\d+)?(?:[eE][+-<>=]?\d+)?\s*))+$/;
        const comparison = re.test(columnFilters[key]) && Function("return " + a[key] + columnFilters[key])();
        if (strContains || comparison){
          //don't return true as need to check other keys in columnFilters
        }else{
          //console.log('false', a)
          return false;
        }
      } else{
        return false;
      }           
    }
    //console.log('true', a)
    return true;
  })
  return filteredChildren;
}

const initFiltered = filterMyFiles(myFiles, columnFilters);

const finalWithParents = initFiltered.map(child => {
  const childWithParents = [child];
  let parent = myObj[child.parentId];
  while (parent){
    //console.log('parent', parent)
    parent._used || childWithParents.unshift(parent)
    myObj[parent.id]._used = true;
    parent = myObj[parent.parentId] || false;    
  }
  return childWithParents;
}).flat();

console.log(finalWithParents)
.as-console-wrapper { max-height: 100% !important; top: 0; }

Basically set up an object to later use for finding all parents.

Then do one filter (i.e. one iteration of the array) and filter for those that match the conditions in the columnFilters object.

Then map (i.e. one iteration) over this filtered array and find each parents using the object created at the start (so nested iterations up to N depth).

flatten the array with .flat() (assumed one final iteration) and then we are done.

Any questions let me know.

Update - For-loop approach plus trying to reduce iterations over array

Cut out a couple of iterations :) (https://codepen.io/Alexander9111/pen/MWagdVz):

const myFiles = [
  { id: 11, file: "Music", parentId: null },
  { id: 12, file: "mp3", parentId: 11 },
  { id: 14, file: "pop", parentId: 12 },
  { id: 15, file: "theme.mp3", dateModified: "2015-03-01", size: 85,  parentId: 14 },
  { id: 16, file: "rock", parentId: 12 },
  { id: 17, file: "soft.mp3", dateModified: "2015-05-13", size: 98, parentId: 16 },
  { id: 18, file: "else.txt", dateModified: "2015-03-03", size: 90, parentId: null },
  { id: 21, file: "Documents", parentId: null },
  { id: 2, file: "txt", parentId: 21 },
  { id: 3, file: "todo.txt", dateModified: "2015-05-12", size: 0.7, parentId: 2 },
  { id: 4, file: "pdf", parentId: 21 },
  { id: 22, file: "map2.pdf", dateModified: "2015-05-21", size: 2.9, parentId: 4 },
  { id: 5, file: "map.pdf", dateModified: "2015-05-21", size: 3.1, parentId: 4 },
  { id: 6, file: "internet-bill.pdf", dateModified: "2015-05-12", size: 1.4, parentId: 4 },
  { id: 7, file: "xls", parentId: 21 },
  { id: 8, file: "compilation.xls", dateModified: "2014-10-02", size: 2.3, parentId: 7 },
  { id: 9, file: "misc", parentId: 21 },
  { id: 10,  file: "something.txt", dateModified: "2015-02-26", size: 0.4,  parentId: 9 }
];

const columnFilters = { file: "map", size: "<3.2" };
console.log(customLocalFilter(myFiles, columnFilters));

function customLocalFilter(array, filters){  
  const myObj = {};
  for (let i = 0; i < myFiles.length; i++) {
    myObj[myFiles[i].id] = myFiles[i];
    //the filtered data is used again as each subsequent letter is typed
    //we need to delete the ._used property, otherwise the logic below
    //in the while loop (which checks for parents) doesn't work:
    delete myObj[myFiles[i].id]._used;
  }

  const filteredChildrenAndParents = [];
  for (let i = 0; i < myFiles.length; i++) {
    const a = myFiles[i];
    let matchFilter = true;
    for (let key in columnFilters) {
      if (a.hasOwnProperty(key)) {
        const strContains = String(a[key]).includes(columnFilters[key]);
        const re = /(?:(?:^|[-+<>!=_*/])(?:\s*-?\d+(\.\d+)?(?:[eE][+-<>!=]?\d+)?\s*))+$/;
        const comparison =
          re.test(columnFilters[key]) &&
          Function("return " + a[key] + columnFilters[key])();
        if (strContains || comparison) {
          //don't return true as need to check other keys in columnFilters
        } else {
          matchFilter = false;
          continue;
        }
      } else {
        matchFilter = false;
        continue;
      }
    }
    if (matchFilter) {
      const len = filteredChildrenAndParents.length;
      filteredChildrenAndParents.splice(len, 0, a);
      let parent = myObj[a.parentId] || false;
      while (parent) {
        //only add parent if not already added:
        parent._used || filteredChildrenAndParents.splice(len, 0, parent);
        //mark each parent as used so not used again:
        myObj[parent.id]._used = true;
        //try to find parent of the current parent, if exists:
        parent = myObj[parent.parentId] || false;
      }
    }
  }
  return filteredChildrenAndParents;
}
.as-console-wrapper { max-height: 100% !important; top: 0; }
Alex L
  • 4,168
  • 1
  • 9
  • 24
  • Thanks I tried it a bit and it looks really promising. I got couple of questions, when you say setup object for later, is it `myObj`? Now I'm trying to plug this in my lib and the datagrid lib uses the `filter` method with `dataView.setFilter(this.customLocalFilter.bind(this));` and then `customLocalFilter(item: any, columnFilters: any)` where I deal with regex to get operator, search string and return boolean value if found.. would I just plug the `filteredChildren` in that case? It seems that I need to run the `initFiltered.map...` after the `filter` ran, is that correct? – ghiscoding Apr 08 '20 at 18:52
  • Couldn't write in 1 comment, so here's the rest of my question... if I need to run the `initFiltered.map...` after the `filter` is done, then I need to overwrite my original dataset, which I could do but it would be nice if I could just put everything in the `filter. To give you an idea of where that will be used, my code has `dataView.setFilter(this.customLocalFilter.bind(this));` which calls the dataview `filter` on this [line](https://github.com/ghiscoding/Angular-Slickgrid/blob/30cf1b4c9c0ed0d38f93acf27a15ad3d196a776b/src/app/modules/angular-slickgrid/services/filter.service.ts#L249) – ghiscoding Apr 08 '20 at 18:55
  • Your answer seems all good, most probably will accept it soon, I'm now basically just looking for guidance in how to plug that with my code in the lib. You can see a live demo [Angular-Slickgrid](https://ghiscoding.github.io/Angular-Slickgrid/#/clientside), there's filter on top of each grid column, or another demo in my other lib [Slickgrid - Tree Data](https://ghiscoding.github.io/slickgrid-universal/#/example06). – ghiscoding Apr 08 '20 at 19:00
  • 1
    Yeah myObj is the object used to get the parents later in the while loop. I also re-wrote it now with for-loops and it should cut down some memory allocation and some iterations to make it faster etc. too. I can also put it into a function so that you can just call it once and make integration with your library easier. I'll try to look at your demos and see how to integrate :) – Alex L Apr 08 '20 at 19:53
  • Thanks, I'm basically looking at how to connect this into the `setFilter` which simply sets a simple JS `filter` method, doing so would work with current code, that is the best option. The second option is to run your method separately and overwrite the dataset after your method ran, but that would also mean having to keep the original dataset into a separate temp variable which I would try to avoid, so using the `filter` is the best option..if ever possible – ghiscoding Apr 08 '20 at 20:02
  • 1
    updated my question (scroll to the end) if that helps.. thanks a lot – ghiscoding Apr 08 '20 at 20:10
  • 1
    Ok will take a look tomorrow, my brain is done for the day :) – Alex L Apr 08 '20 at 20:39
  • Found a small problem with the result of your current function, if I have the conditions that returns both map files, the result set length should be 4 not 6 (documents & pdf are repeated). Possible solution is to run another iteration to remove duplicates. I updated my question to mention this. – ghiscoding Apr 08 '20 at 21:35
  • Done! Haha brain was still thinking about the problem. Just need to delete each key, value from myObj once it’s used. No further iteration needed :) – Alex L Apr 08 '20 at 21:59
  • Ahh wait, that won’t cover another edge case when we have one common first level folder but two different second level folders. The first matched file will cause the first level folder to be deleted. Rather than delete, I should just mark myObj with a field “used” or something. Better come back to it fresh tomorrow! – Alex L Apr 08 '20 at 22:08
  • 1
    Thanks for all, I see you have a GitHub account, so I took your code and created a quick demo that you can view/clone from this [repo](https://github.com/ghiscoding/SlickGrid). If you want to test directly in there, it's as simple as d/l the repo and run the html file, that's it, SlickGrid is old and was done in html/js/jquery so there's no setup... I got something working but I'd like to address 3 points that I wrote in readme. The example code is [here](https://github.com/ghiscoding/SlickGrid/blob/master/examples/example-tree-data.html). Push code if u want, I can make u collaborator too. – ghiscoding Apr 09 '20 at 00:03
  • See Pull #2 https://github.com/ghiscoding/SlickGrid/pull/2 (answer here also updated to match) - I will do another pull request to optimize it further too. I would love to be a collaborator :) – Alex L Apr 09 '20 at 10:46
  • 1
    Thanks a lot, really amazing work, thumbs up. I wish I could give you more points :) – ghiscoding Apr 09 '20 at 13:13
  • 1
    My pleasure! I enjoy solving these kinds of things. – Alex L Apr 09 '20 at 13:58
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211307/discussion-between-ghiscoding-and-alex-l). – ghiscoding Apr 09 '20 at 17:47