0

I have several datasets based in JSON. This is because the datasets are pulled in through an API call. I can change this, if it makes sense, but it might slow down the intended function a lot.

Now, each dataset could easily 10,000 and even 100,000 records within i.e. dataset.length = 10,000.

Now in each of these datasets, they have an id column that allows me to connect them. I essentially want to select a primary dataset, and then cycle through the secondary datasets, find the first record (or even all records) where that share the same identifier. let's say primary.id = secondary.id

This is essentially like an INNER JOIN but running through with JSON and using Google Scripts.

I want to match the rows in the secondary datasets and just add them to the relevant places in the primary and output just one dataset to the file.

I wrote the following script

function mergeRecords(primary,identifiers,columns) {
 
  const args = Array.prototype.slice.call(arguments);
    args.splice(0,3);

  const argsFlat = args.flat();

  if (args.length<1) {
    throw 'Insufficient Number of datasets, please add at least 1 secondary dataset';
  }

 
  try {

    for (var i in primary) {
      var parent = primary[i];

      var a = 0;
      var child = argsFlat[a]; 
    
      while (parent[identifiers] != child[identifiers] && a<argsFlat.length) {
        a++;
      }
       
      columns.forEach(function(x) {
        parent['Child-'+x]=child[x];
       }
      );

    }

    return primary;
   } catch(err) {
        Logger.log(err);
        Logger.log(primary);
        Logger.log(arguments);
        Logger.log(argsFlat);
        Logger.log(columns);
        Logger.log(identifiers);
        Logger.log(a);
        Logger.log(parent.id+' '+child.id);
  }
}

Essentially the primary is the main dataset to compare all the others against. The identifier is the name of the key that should be the same in target secondary datasets. columns are the data properties I want to retrieve from the secondary sets.

As you can see, I've made it that arguments need to be included after the columns argument otherwise it errors out.

I am using the while method/function (honestly I'm not sure the difference, I'm still new to this) because I have found it to be quite a bit faster than using

for (var a in argsFlat) { 
 var child = argsFlat[a];
 if (parent[identifiers] == child[identifiers]) {
    columns.forEach(function(x) {
        parent['Child-'+x]=child[x];
        }
      );
 else {continue;}

even though using this would allow to append the data from the different datasets if multiple datasets have the same identifier as the parent.

This is the test source I am using:

function test() {
   var primary = {
     'records': [{
    'id':1234445,
    'name':'Yiz Segall1',
    'email':'123@123.com'},
    {
      'id':567568,
      'name':'viva1',
      'email':'old@new.com1'
    },
    {
      'id':123442,
      'name':'shompie1',
      'email':'shompie@gmail.com1'
    }]
  }

  var secondary = {
     'records': [{
    'id':1234445,
    'name':'2Yiz Segall',
    'email':'2123@123.com',
    'phone':12435435},
    {
      'id':567568,
      'name':'2viva',
      'email':'2old@new.com',
      'phone':4445555
    },
    {
      'id':123442,
      'name':'2shompie',
      'email':'2shompie@gmail.com',
      'phone':5556666
    }]
  }

  var tertiary = {
     'records': [{
    'id':1234445,
    'name':'3Yiz Segall',
    'email':'3123@123.com',
    'address':'32 wisconsin avenue'
    },
    {
      'id':567568,
      'name':'3viva',
      'email':'o3ld@new.com',
      'address':'14 pine street'
    },
    {
      'id':123442,
      'name':'s3hompie',
      'email':'3shompie@gmail.com',
      'address':'15 ny street'
    }]
  }

  var records = mergeRecords(primary.records,'id',['name','email'],secondary.records,tertiary.records);

  Logger.log(records);

}

My question is 2-fold:

  1. While the first record in the test comes back correct, the others just pull in the first data, so how do I fix?
    • this is the response I get from the test [{email=123@123.com, Child-name=2Yiz Segall, name=Yiz Segall1, Child-email=2123@123.com, id=1234445.0}, {email=old@new.com1, name=viva1, Child-name=2Yiz Segall, id=567568.0, Child-email=2123@123.com}, {name=shompie1, email=shompie@gmail.com1, Child-name=2Yiz Segall, Child-email=2123@123.com, id=123442.0}]
  2. If I wanted to make it that it would attach all data rows that had the same identifier as the primary row, I have a problem with naming. Right now, it uses parent['Child-'+x]=child[x]; which only works for the first instance of this. How do I make it child1 and then child 2 and then child 3, so it's flexible enough to adapt to however many datasets I need to include.
    • I thought of using the dataset names, but the data is converted into an array and then flattened, so it's impossible to tell where it came from, or is there an eays way to do that?

Remember that this process plays a part of preparing the data properly to output into a file, by this point, I have already run script to make 3-4 API calls.

Any help is welcome, thanks.

Jknight
  • 105
  • 11
  • I would want to parse the JSON and deal with them as objects – Cooper Apr 09 '22 at 20:47
  • When you have a specific scripting question please feel free to return and ask that question for now your quesrtion is too broad. – Cooper Apr 09 '22 at 20:49
  • 1
    https://stackoverflow.com/questions/46849286/merge-two-array-of-objects-based-on-a-key – RemcoE33 Apr 09 '22 at 20:56
  • 3
    I have to apologize for my poor English skill. Unfortunately, I cannot understand both `My question is 2-fold:` and `Remember that this process plays a part of preparing the data properly to output into a file, by this point, I have already run script to make 3-4 API calls.`. In order to correctly understand your question, can you provide the sample output value you expect using your sample input value? – Tanaike Apr 10 '22 at 00:19
  • @Cooper, how would I do that? Also, how is this too broad? I am asking 2 very specific questions about merging them together. – Jknight Apr 10 '22 at 05:08
  • @RemcoE33, this is close and I really hoped that this would work because it's very simple, however, that answer only works if there are only 2 datasets and if I want to copy across all the data properties in dataset 2 to dataset 1. and most importanlty, if the id key in the primary json is the same as the in the child json datasets, and they aren't usually. I guess if I merge all the child datasets into 1, and if I just bring in all the data, it could work. But I prefer not, as, based on the structure of the JSON datasets, it will complicate things more, but will try it. – Jknight Apr 10 '22 at 05:15
  • also @RemcoE33, I tested this out, it would work, provided the the field names in the child dataset doesn't have the same key in the primary, and unfortunately, that isn't always necessarily true. For example, there will be 2 CreatedDate keys that I want their value, the created date in the primary and the created date of the child and that won't work. And again, I don't want every key value pair to be added, although that's not a necessity, I can technically work with that. – Jknight Apr 10 '22 at 05:40
  • Hi, please consider providing your desired output, in order to clarify this. – Iamblichus Apr 11 '22 at 09:48

1 Answers1

1

As a guess. Try to change this:

parent['Child-'+x]=child[x];

With this:

if (!parent['Child-' + x]) {
    parent['Child-' + x] = child[x];
} else {
    let counter = 1;
    while (parent['Child' + counter + '-' + x]) continue;
    parent['Child' + counter + '-' + x] = child[x];
}
Yuri Khristich
  • 13,448
  • 2
  • 8
  • 23
  • Thanks @yuri Khristich. This takes care of the second problem, but how does it solve the primary problem of getting the right records across? – Jknight Apr 10 '22 at 05:20
  • Could you please show your desired output? I see what you're getting and you're saying that this is wrong result. But it still eludes me what you trying to get. How the correct result should look like? – Yuri Khristich Apr 10 '22 at 13:24