20

I have data that is stored on a client's browser in-memory. For example, let's say the dataset is as follows:

"name" (string),     "age" (int32),   "isAdult" (bool)
"Tom"          ,     29               1
"Tom"          ,     14               0
"Dina"         ,     20               1

I would like to run non-trivial SQL statements on this data in javascript, such as:

SELECT name, GROUP_CONCAT(age ORDER BY age) ages
FROM arrowData a1 JOIN arrowData a2 USING (name)
WHERE a1.isAdult != a2.isAdult

And I would get:

"name" (string),      "ages" (string)
"Tom"                 "14,29"

The data that I have in javascript is stored in as apache Arrow (also used in connection with Perspective), and I'd like to execute SQL on that apache Arrow data as well. As a last resort, I think it would be possible to use sqllite in wasm, but I'm hoping there might be a simpler way where I can query the Arrow data directly, without having to move it all into a sqllite store in order to execute a query on it.

Are there any ways to do this?

Josh Lee
  • 171,072
  • 38
  • 269
  • 275
David542
  • 104,438
  • 178
  • 489
  • 842

3 Answers3

0

You can use Alasql to do some of what you want, but it does not support grouping.

var data = [
  {
    name: 'Tom',
    age: 29,
    isAdult: 1
  },
  {
    name: 'Tom',
    age: 14,
    isAdult: 0
  },
  {
    name: 'Dina',
    age: 20,
    isAdult: 1
  }
];

var res = alasql('SELECT name, age from ? a1 JOIN ? a2 WHERE a1.isAdult != a2.isAdult AND a1.name = a2.name', [data, data]);
  
document.getElementById('result').textContent = JSON.stringify(res);
<script src="https://cdn.jsdelivr.net/alasql/0.2/alasql.min.js"></script> 

<span id="result"></span>
Todd Chaffee
  • 6,754
  • 32
  • 41
0

It is good stuff what you are looking for. :) Sadly thanks to some trends in ~2010 as far as I know there is no actively maintained and supported API for this. But...

If you want to have full ANSI SQL on the client side in memory and you are willing to populate the database you could run the mentioned SQLite. Maybe this the only fulfilling option for you (if you could not leave some of the requirements).

If you could allow the luxury to copy data you could check out the AlaSQL project it does support join-s and some of the ANSI SQL features, but it is not complete and it contains known disruptive bugs:

Please be aware that AlaSQL has bugs. Beside having some bugs, there are a number of limitations:

AlaSQL has a (long) list of keywords that must be escaped if used for column names. When selecting a field named key please write SELECT key FROM ... instead. This is also the case for words like value, read, count, by, top, path, deleted, work and offset. Please consult the full list of keywords.

It is OK to SELECT 1000000 records or to JOIN two tables with 10000 records in each (You can use streaming functions to work with longer datasources - see test/test143.js) but be aware that the workload is multiplied so SELECTing from more than 8 tables with just 100 rows in each will show bad performance. This is one of our top priorities to make better.

Limited functionality for transactions (supports only for localStorage) - Sorry, transactions are limited, because AlaSQL switched to more complex approach for handling PRIMARY KEYs / FOREIGN KEYs. Transactions will be fully turned on again in a future version.

A (FULL) OUTER JOIN and RIGHT JOIN of more than 2 tables will not produce expected results. INNER JOIN and LEFT JOIN are OK.

Please use aliases when you want fields with the same name from different tables (SELECT a.id AS a_id, b.id AS b_id FROM ?).

At the moment AlaSQL does not work with JSZip 3.0.0 - please use version 2.x.

JOINing a sub-SELECT does not work. Please use a with structure (Example here) or fetch the sub-SELECT to a variable and pass it as an argument (Example here).

AlaSQL uses the FileSaver.js library for saving files locally from the browser. Please be aware that it does not save files in Safari 8.0.

There are probably many others. Please help us fix them by submitting an issue. Thank you!

We planned to use it in one project, but there were more problems than solutions (for us) while introducing the project to our stack. So we backed out of from it. So I do not have production experience with this piece software...

At older times I hoped that Google Gears will support something like the desired function but partly it got replaced by HTML5 client side storage and sadly the project got discontinued.

The HTML5 WebSQL Database would have been perfect for your use-case, but it is sadly depricated. Tho most (?) browsers still support it in 2019. You can check some examples here. If you can allow yourself to build on a depricated API this could be the solution, but I do not really recommend it as it is not guaranteed that it will work...

When our project run having the same problems we ended up having to use the localStorage and program every "SELECT" manually, which of course was not at all ANSI SQL like...

If we roll back to the original problem "[SQL] query the Arrow data directly" I have no adapter in mind to use it as SQL... These kind of operations still tend to be on the server side and with the wasm SQLite I think those are the options.

Hash
  • 4,647
  • 5
  • 21
  • 39
0

There is DuckDB Wasm now which can run SQL on arrow tables.

DuckDB-Wasm is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js.

ns15
  • 5,604
  • 47
  • 51