12

What's a pragmatic analog to SQL 'JOIN' for tables represented as arrays of Javascript objects? Javascript Array.join and D3.js 'd3.merge` are not the same concept.

E.g. SELECT * FROM authors LEFT JOIN books ON authors.id = books.author_id?

First table:

var authors = 
[  { id: 1, name: 'adam'},
   { id: 2, name: 'bob'},
   { id: 3, name: 'charlie'}, ...
]

Second table:

var books = 
[  { author_id: 1, title: 'Coloring for beginners'}, 
   { author_id: 1, title: 'Advanced coloring'}, 
   { author_id: 2, title: '50 Hikes in New England'},
   { author_id: 2, title: '50 Hikes in Illinois'},
   { author_id: 3, title: 'String Theory for Dummies'}, ...
]

The tables are loaded from CSV using D3.js d3.csv(), so have D3.js already, open to other libs but generally prefer coding directly if not too far out of the way.

I see Native way to merge objects in Javascript which uses RethinkDB, which seems over the top for this, but this is the idea.

Community
  • 1
  • 1
prototype
  • 7,249
  • 15
  • 60
  • 94

6 Answers6

21

Basically, like this:

// first, build an easier lookup of author data:
var authormap = {};
authors.forEach(function(author) {authormap[author.id] = author;});

// now do the "join":
books.forEach(function(book) {
    book.author = authormap[book.author_id];
});

// now you can access:
alert(books[0].author.name);
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592
  • Tucking `author` in as an object in `book` is great, was thinking too much like SQL to see the elegance of that. – prototype Mar 23 '14 at 18:17
7

You can do it with Alasql JavaScript SQL library:

var res = alasql('SELECT * FROM ? authors \
       LEFT JOIN ? books ON authors.id = books.author_id',[authors, books]);

Try this example with your data in jsFiddle.

Also you can load CSV data directly into SQL expression:

alasql('SELECT * FROM CSV("authors.csv", {headers:true}) authors \
            LEFT JOIN CSV("books.csv", {headers:true}) books \
            ON authors.id = books.author_id',[], function(res) {
      console.log(res);
 });
agershun
  • 4,077
  • 38
  • 41
3

I was looking for something like this as well, and solved it using a bit of functional programming. I've taken the liberty of adding a couple of objects to your initial arrays in order to deal with the "NULL" cases.

var books = [
    {author_id: 1, title: 'Coloring for beginners'},
    {author_id: 1, title: 'Advanced coloring'},
    {author_id: 2, title: '50 Hikes in New England'},
    {author_id: 2, title: '50 Hikes in Illinois'},
    {author_id: 3, title: 'String Theory for Dummies'},
    {author_id: 5, title: 'Map-Reduce for Fun and Profit'}    
];
var authors = [
    {id: 1, name: 'adam'},
    {id: 2, name: 'bob'},
    {id: 3, name: 'charlie'},
    {id: 4, name: 'diane'}
];

So now you have a book without an author and an author without a book. My solution looks like this:

var joined = books.map(function(e) {
    return Object.assign({}, e, authors.reduce(function(acc, val) {
        if (val.id == e.author_id) {
            return val
        } else {
            return acc
        }
    }, {}))
});

The map method goes through every element of books using e and returns an array whose elements are the merged object of e, with its corresponding object in the authors array. Object.assign({},a,b) takes care the merge without modifying the original objects.

The corresponding object to each e in books is found by applying a reduce method on the authors array. Starting out with the initial value of an empty object {} (this is the second argument of reduce - it also could have been a null author such as {id:'', name ''}) the reduce method goes through the elements of authors using val and returns the object that ends up in acc. When a match is found between the books author_id and the author's id the entire matched author object ends up in acc and eventually gets returned by authors.reduce(...).

n.b. - Using reduce isn't that efficient because there is no way to break out of reduce loop once the match is found, it will continue to the end of the array

Community
  • 1
  • 1
stuzero
  • 31
  • 3
1

This is inspired by @Niet's answer.

I ran into a problem with duplicate data, so I added the step that clones the record in the lookup table before joining it with the current record.

var authors = [{
    id: 1,
    name: 'adam'
}, {
    id: 2,
    name: 'bob'
}, {
    id: 3,
    name: 'charlie'
}];

var books = [{
    author_id: 1,
    title: 'Coloring for beginners'
}, {
    author_id: 1,
    title: 'Advanced coloring'
}, {
    author_id: 2,
    title: '50 Hikes in New England'
}, {
    author_id: 2,
    title: '50 Hikes in Illinois'
}, {
    author_id: 3,
    title: 'String Theory for Dummies'
}];

function joinTables(left, right, leftKey, rightKey) {

    rightKey = rightKey || leftKey;

    var lookupTable = {};
    var resultTable = [];
    var forEachLeftRecord = function (currentRecord) {
        lookupTable[currentRecord[leftKey]] = currentRecord;
    };

    var forEachRightRecord = function (currentRecord) {
        var joinedRecord = _.clone(lookupTable[currentRecord[rightKey]]); // using lodash clone
        _.extend(joinedRecord, currentRecord); // using lodash extend
        resultTable.push(joinedRecord);
    };

    left.forEach(forEachLeftRecord);
    right.forEach(forEachRightRecord);

    return resultTable;
}
var joinResult = joinTables(authors, books, 'id', 'author_id');
console.log(joinResult);

The result is

[
    {
        "id": 1,
        "name": "adam",
        "author_id": 1,
        "title": "Coloring for beginners"
    },
    {
        "id": 1,
        "name": "adam",
        "author_id": 1,
        "title": "Advanced coloring"
    },
    {
        "id": 2,
        "name": "bob",
        "author_id": 2,
        "title": "50 Hikes in New England"
    },
    {
        "id": 2,
        "name": "bob",
        "author_id": 2,
        "title": "50 Hikes in Illinois"
    },
    {
        "id": 3,
        "name": "charlie",
        "author_id": 3,
        "title": "String Theory for Dummies"
    }
] 
Isioma Nnodum
  • 1,318
  • 13
  • 13
1

I had a slight variation on this case where I needed to join two arrays on a consistent key between the two data sets. A JSON object became too unwieldyas my data set grew, so it was much easier to concat the two arrays:

 var data = new Array(["auth1","newbook1","pubdate1"],["auth2","newbook2","pubdate2"]);
 var currData = new Array(["auth1","newbook3","pubdate3"],["auth2","newbook3","pubdate4"]);
 var currDataMap = currData.map(function(a){return a[0];});
 var newdata = new Array();
 for(i=0;i<data.length;i++){
   if(currDataMap.indexOf(data[i][0])>-1){
     newdata[i] = data[i].concat(currData[currDataMap.indexOf(data[i][0])].slice(1));
  }
}

Output:

[
   [auth1, newbook1, pubdate1, newbook3, pubdate3], 
   [auth2, newbook2, pubdate2, newbook3, pubdate4]
]

In my case, I also needed to drop rows where there was no new data, so you may want to exclude the conditional.

0

Not the most elegant code but i think it is pretty simple to follow / hack if you need to. Allows for INNER, LEFT, and RIGHT joins.

You should just need to copy and paste the functions into your code to get a correct output. Example is at the bottom

function remove_item_from_list(list_, remove_item, all = true) {
  /*
  Removes all occurrences of remove_item from list_
  */
    for (var i = list_.length; i--;) {
        if (list_[i] === remove_item) {
            list_.splice(i, 1);
        }
    }
    return list_
}

function add_null_keys(dict_, keys_to_add){
  /*
  This function will add the keys in the keys_to_add list to the dict_ object with the vall null

  ex: 
  dict_ = {'key_1': 1, 'key_2': 2}
  keys_to_add = ['a', 'b', 'c']

  output:
  {'key_1': 1, 'key_2': 2, 'a': NULL, 'b': NULL', 'c':'NULL'}
  */

  //get the current keys in the dict
  var current_keys = Object.keys(dict_)
  for (index in keys_to_add){
    key = keys_to_add[index]
    //if the dict doesnt have the key add the key as null
    if(current_keys.includes(key) === false){
      dict_[key] = null
    }
  }
  return dict_
}

function merge2(dict_1, dict_2, on, how_join){
  /*
  This function is where the actual comparison happens to see if two dictionaries share the same key

  We loop through the on_list to see if the various keys that we are joining on between the two dicts match.

  If all the keys match we combine the dictionaries.

  If the keys do not match and it is an inner join, an undefined object gets returned
  If the keys do not match and it is NOT an inner join, we add all the key values of the second dictionary as null to the first dictionary and return those
  */

  var join_dicts = true

  //loop through the join on key
  for (index in on){
    join_key = on[index]

    //if one of the join keys dont match, then we arent joining the dictionaries
    if (dict_1[join_key] != dict_2[join_key]){
      join_dicts = false
      break
    }
  }

  //check to see if we are still joining the dictionaries
  if (join_dicts === true){
    return Object.assign({}, dict_1, dict_2);
  }

  else{
    if (how_join !== 'inner'){
      //need to add null keys to dict_1, which is acting as the main side of the join
      var temp = add_null_keys(dict_1, Object.keys(dict_2))
      return temp
    }
  }
}

function dict_merge_loop_though(left_dict, right_dict, on, how_join){
  /*
  This function loops through the left_dict and compares everything in it to the right_dict

  it determines if a join happens, what the join is and returns the information

  Figuring out the left/right joins were difficult. I had to add a base_level dict to determine if there was no join
  or if there was a join...its complicated to explain
  */

  var master_list = []
  var index = 0

  //need to loop through what we are joining on 
  while(index < left_dict.length){
    //grab the left dictionary
    left_dict_ = left_dict[index]
    var index2 = 0

    //necessary for left/right join
    var remove_val = add_null_keys(left_dict_, Object.keys(right_dict[index2]))
    var temp_list = [remove_val]

    while (index2 < right_dict.length){
      //get the right dictionary so we can compete each dictionary to each other
      right_dict_ = right_dict[index2]

      //inner join the two dicts
      if (how_join === 'inner'){
        var temp_val = merge2(left_dict_, right_dict_, on, how_join)

        //if whats returned is a dict, add it to the master list
        if (temp_val != undefined){
          master_list.push(temp_val)
        }
      }

      //means we are right/left joining
      else{

        //left join the two dicts
        if (how_join === 'left'){
          var temp_val = merge2(left_dict_, right_dict_, on, how_join)
        }

        //right join the two dicts
        else if (how_join === 'right'){
          var temp_val = merge2(right_dict_, left_dict_, on, how_join)
        }
        temp_list.push(temp_val)
      }

      //increment this guy
      index2++
    }


    //Logic for left/right joins to for what to add to master list
    if (how_join !== 'inner'){
      // remove the remove val from the list. All that remains is what should be added
      //to the master return list. If the length of the list is 0 it means that there was no
      //join and that we should add the remove val (with the extra keys being null) to the master
      //return list
      temp_list = remove_item_from_list(temp_list, remove_val)

      if (temp_list.length == 0){
        master_list.push(remove_val)
      }
      else{
        master_list = master_list.concat(temp_list); 
      }
    }
    //increment to move onto the next thing
    index++

  }

  return master_list
}

function merge(left_dict, right_dict, on = [], how = 'inner'){
  /*
  This function will merge two dictionaries together
  You provide a left dictionary, a right dictionary, a list of what key to join on and 
  what type of join you would like to do (right, left, inner)

  a list of the merged dictionaries is returned
  */

  //get the join type and initialize the master list of dictionaries that will be returned
  var how_join = how.toLowerCase()
  var master_list = []

  //inner, right, and left joins are actually pretty similar in theory. The only major difference between
  //left and right joins is the order that the data is processed. So the only difference is we call the
  //merging function with the dictionaries in a different order
  if (how_join === 'inner'){
    master_list = dict_merge_loop_though(left_dict, right_dict, on, how_join)
  }

  else if (how_join === 'left'){
    master_list = dict_merge_loop_though(left_dict, right_dict, on, how_join)
  }

  else if (how_join === 'right'){
    master_list = dict_merge_loop_though(right_dict, left_dict, on, how_join)
  }

  else{
    console.log('---- ERROR ----')
    console.log('The "how" merge type is not correct. Please make sure it is either "inner", "left" or "right"')
    console.log('---- ERROR ----')
  }

  return master_list
} 

/*
-------------------- EXAMPLE --------------------
var arr1 = [
    {'id': 1, 'test': 2, 'text':"hello", 'oid': 2},
    {'id': 1, 'test': 1, 'text':"juhu", 'oid': 3},
    {'id': 3, 'test': 3, 'text':"wohoo", 'oid': 4},
    {'id': 4, 'test': 4, 'text':"yeehaw", 'oid': 1}
];

var arr2 = [
    {'id': 1,'test': 2, 'name':"yoda"},
    {'id': 1,'test': 1, 'name':"herbert"},
    {'id': 3, 'name':"john"},
    {'id': 4, 'name':"walter"},
    {'id': 5, 'name':"clint"}
];

var test = merge(arr1, arr2, on = ['id', 'test'], how = 'left')
for (index in test){
  dict_ = test[index]
  console.log(dict_)
}

*/