90

This question is concerning joining two databases in Google spreadsheet using =QUERY function

I have a table like so in range A1:C3

a d g
b e h
c f i

I have another table

c j m
a k n
b l o

I want the final table to look like this

a d g k n
b e h l o 
c f i j m

I can do this by using a vlookup function pretty easily in cell D1 and paste it down and across, but my dataset is huge. I would need a whole page of vlookups and Google Spreadsheet tells I'm at my limit in complexities.

I look at the Google's Query Language reference... there doesn't seem to be an type of "join" functions mentioned. You would think it would be an easy "join on A" type operation.

Can anybody solves this without a vlookup?

TheMaster
  • 45,448
  • 6
  • 62
  • 85
jason
  • 3,811
  • 18
  • 92
  • 147
  • I am guessing that `index` and `match` would have the same complexity as vlookups, but you can try [this implementation](https://docs.google.com/spreadsheets/d/1EZc1kV4YQ2Ld8zagZ725xqSlCZKYOlgvV_eEejJ6K8s/edit#gid=0). If this does not work, then I think custom function needs to be written. – Konstant Apr 16 '14 at 01:40
  • @Konstant. Thanks for the suggestion. Your solution also works. I think this question became a lot less relevant since the new Google sheets. Supposedly they have eliminated the number of complexities limits, but I still notice some problems for bigger sheets. – jason Apr 16 '14 at 07:24
  • can you share me a sample sheet (with huge data) if possible. I will try to write a custom function for the same. – Konstant Apr 16 '14 at 08:18
  • I can't share the data. I would think you recreate it with the `rand()` function – jason Apr 16 '14 at 11:40
  • I just wanted to know the approx number of rows you have. – Konstant Apr 16 '14 at 13:51
  • @Konstant about 20k rows – jason Apr 17 '14 at 01:12
  • Could you mention how you'd do this for smaller data sets (the `vlookup` / `index` / `match` solutions)? – ShreevatsaR Feb 04 '16 at 18:43
  • 1
    @ ruben. When is google going to rewrite their query function? seems like more and more people are using it, time to make it more robust! – jason Mar 22 '17 at 10:54
  • @jason I didn't get a notification because the space between `@` and my displayname. AFAIK Google didn't announced that but it's worth to note that the official help article was made "more robust". I will add to my answer. – Rubén Jan 11 '18 at 15:56
  • Didn't know about `vlookup`, and this is exactly what I was looking for. Thank you! – Dennis Golomazov Nov 04 '19 at 16:11
  • You may also be interested in this answer https://stackoverflow.com/questions/65363849/google-sheets-query-left-join/65368321#65368321 which demonstrates a one-to-many join (for small number of columns) – Tom Sharpe Dec 20 '20 at 14:56

6 Answers6

72

Short answer

Google QUERY Language version 0.7 (2016) doesn't include a JOIN (LEFT JOIN) operator but this could be achieved by using an array formula which result could be used as input for the QUERY function or for other uses.

Explanation

Array formulas and the array handling features of Google Sheets make possible to make a JOIN between two simple tables. In order to make easier to read, the proposed formula use named ranges instead of range references.

Named Ranges

  • table1 : Sheet1!A1:C3
  • table2 : Sheet2!A1:C3
  • ID : Sheet1!A1:A3

Formula

=ArrayFormula(
   {
     table1,
     vlookup(ID,table2,COLUMN(Indirect("R1C2:R1C"&COLUMNS(table2),0)),0)
   }
)

Remarks:

  • Using open ended ranges is possible but this could make the spreadsheet slower.
  • To speed up the recalculation time :
  1. Replace Indirect("R1C2:R1C"&COLUMNS(table2),0) by an array of constants from 2 to number of columns of table2.
  2. Remove the empty rows from the spreadsheet

Example

See this sheet for an example

Note

On 2017 Google improved the official help article in English about QUERY, QUERY function. It still doesn't include yet topics like this but could be helpful to understand how it works.

ctrl-alt-delor
  • 7,506
  • 5
  • 40
  • 52
Rubén
  • 34,714
  • 9
  • 70
  • 166
  • 2
    For now this is the only answer here that actually works for large datasets without pasting formulas in each cell. It's really helpful. Thanks! – dmitrych Apr 03 '16 at 01:58
  • 1
    This is very clever. Is it possible to make this into an INNER JOIN? – cjm2671 Oct 13 '16 at 10:44
  • 1
    @cjm2671: Thanks. If you didn't do this yet, please post a new question. This could be obvious, but anyway... If you add a link here I'll be able to find it easily :) – Rubén Oct 13 '16 at 13:47
  • 1
    best answer I got so far – Sarvar Nishonboyev May 05 '20 at 19:51
  • Thank you @Rubén, this is gold! Can you expand on what you mean by "...by an array of constants from 2 to number of columns of table2". I've tried to use {2,2} and variants as the substitute for the INDIRECT(...) without success – Oren Pinsky Dec 08 '20 at 21:01
  • @OrenPinsky I suggest you to post a new question including sample data an the formula that you tried. – Rubén Dec 08 '20 at 21:09
  • Found the solution @Rubén. I think you meant "Replace Indirect(.." by "Replace COLUMN(Indirect(..". I was able to get the result I needed using the following: =ArrayFormula( { table1!A1:C3, vlookup(table1!A1:A3,table2!A1:C3,{2,3},0) } ) – Oren Pinsky Dec 08 '20 at 21:12
  • In some languages a backslash should replace a comma. E.g.: =ArrayFormula({table1!A1:C3\ vlookup(table1!A1:A3;table2!A1:C3;COLUMN(Indirect("R1C2:R1C"&COLUMNS(table2!A1:C3);0));0)}) – Cleber Jorge Amaral Sep 20 '21 at 13:59
  • @CleberJorgeAmaral Rather than depending on the language depends on the regional setting (this is explained on the second link in the answer body) – Rubén Sep 20 '21 at 15:34
21

So, this answers how you do it WITH a Vlookup-function, but in only one cell.
In your example, given that each table of data has the following cell references:

Table1: Sheet1!A1:C3

a d g
b e h
c f i

Table2: Sheet2!A1:C3

c j m
a k n
b l o

This is how the formula should be constructed.

Join-formula

=ArrayFormula(
   {
     Sheet1!A1:C,
     vlookup(Sheet1!A1:A, {Sheet2!A1:A, Sheet2!B1:C}, {2,3}, false)
   }
)

The key to get this formula to work, is to understand how to use curly brackets in the Vlookup Range. You basically define the first cell reference of the Range as the column which is to be a match to the Vlookup Search_Key. The rest of the cell references in the Range is in relation to the columns which you would like to join.

The Index is written as {2,3} to return the second and third column of the Range (the Range consists of a total of 3 columns); curly brackets has nothing to do with Arrayformula in the Vlookup Index, but is necessary to return multiple columns from the Vlookup function. The reason to not write {1,2,3} is because you would not like to include the column which is being used for the purpose of joining.

Example where the column in table2 used for joining, is located in a different column (to the right of the data which is to be joined)

This kind of Join-formula can be utilized even if the join-column in the second table is located as the third column of that table. Let's say that the raw-data in this example would look like this:

Table1 (Sheet1):

a d g
b e h
c f i

Table2 (Sheet2):

j m c
k n a
l o b

If you write the formula like this, you'll still get the desired outcome (as displayed in the table of joined data):

=ArrayFormula(
   {
     Sheet1!A1:C,
     vlookup(Sheet1!A1:A, {Sheet2!C1:C, Sheet2!A1:B}, {2,3}, false)
   }
)

The table of joined data:

a d g k n
b e h l o 
c f i j m

In the Join-formula, notice that the third column of Table2 is located as the first cell reference in the Vlookup Range!
The reason to why this works, is because when you use curly brackets in the Range (in conjunction with Arrayformula), the Vlookup Search_Key will NOT look for a column as a common denominator within the raw-data, instead it will use the Array within curly brackets as a reference to find a column as a common denominator (by default this is the first column of the Range).

I've written a comprehensive guide about this topic called:

'Mastering Join-formulas in Google Sheets'

Community
  • 1
  • 1
Viktor
  • 279
  • 3
  • 5
19

You can use ARRAYFORMULA or YOU can just drag this formula: after an import or QUERY-ing the first table; in the D column:

=QUERY(Sheet2!A1:C3, "Select B,C WHERE A='" & A1 & "'", 0)
Benny
  • 607
  • 7
  • 20
White_King
  • 748
  • 7
  • 21
5

I solved this by using Javascript LINQ (language integrated query).

It lets you specify Javascript with complex join conditions. You can also perform other SQL queries such as Grouping, Projecting, Sorting and Filtering your sheets as if they were database tables. Look at the links below.

Note that in the LINQ query language I replaced all spaces in column names with underscores to make them valid JS identifiers.

https://docs.google.com/spreadsheets/d/1DHtQlQUlo-X_YVfo-Wo-b7315sSk2pxL5ci4Y9lxvZo/edit?usp=sharing

https://script.google.com/d/1R5L2ReHJrBRwyoSoVOFLzEQZiGtxidPfPkAeVownt7SWX6TpacY7gA7j/edit?usp=sharing

user2738245
  • 121
  • 1
  • 1
1

If you can map each "index" (a, b, c) to a specific row or column, then you could use the INDEX function.

In this case, you could probably map 'a' to column A (or row 1), 'b' to column B (or row 2), and so on.

Also, Merge Tables seem to address this exact use case.

Marco Roy
  • 4,004
  • 7
  • 34
  • 50
0

With the 'other' table in A5:C7, please try:

=query({A1:C3,query(sort(A5:C7,1,TRUE),"Select Col2,Col3")})
pnuts
  • 58,317
  • 11
  • 87
  • 139