0

I have a database table that keeps a column for versioning of the entries. Client applications should be able to query the server with their current versions and IDs, and the server respond with the rows that have a higher version for each of the entries. I am not an expert on MySQL, so I cannot see how to achieve this. I have tried various things, but I am currently far from producing anything that works efficiently.

Example:

Server data:

mysql> SELECT id, version FROM my_data;
+-----+---------+
| id  | version |
+-----+---------+
|   1 |       0 |
|   2 |       1 |
|   3 |       2 | <-- The JSON data below has lower version, so this row should be selected.
|   4 |       0 |
|   5 |       1 |
|   6 |       0 |
|   7 |       1 | <-- The JSON data below has lower version, so this row should be selected.
|   8 |       1 |
|   9 |       4 | <-- The JSON data below has lower version, so this row should be selected.
|  10 |       1 |
+-----+---------+
10 rows in set (0.00 sec)

Data sent from the client:

The client then queries the server with the following data in JSON (or whatever, but I have JSON in my case). The server side is php and I need to parse this JSON data and include it in the query somehow. This is the data the client currently contains.

{
    "my_data": [
        {
            "id": 1,
            "version": 0
        },
        {
            "id": 2,
            "version": 1
        },
        {
            "id": 3,
            "version": 0
        },
        {
            "id": 4,
            "version": 0
        },
        {
            "id": 5,
            "version": 1
        },
        {
            "id": 6,
            "version": 0
        },
        {
            "id": 7,
            "version": 0
        },
        {
            "id": 8,
            "version": 1
        },
        {
            "id": 9,
            "version": 2
        },
        {
            "id": 10,
            "version": 1
        }
    ]
}

In this example I want the MySQL query to return 3 rows; namely the 3 rows with id 3, 7 and 9 because the client version is lower than the server version, thus it needs to fetch some data for updating. How can I achieve this in a single, simple query? I do not want to run one query for each row, even if this is possible.

Desired result from the sample data:

The resulting data should be the rows in which the version in the database on the server side is greater than the data with the corresponding id in the JSON data set.

mysql> <INSERT PROPER QUERY HERE>;
+-----+---------+
| id  | version |
+-----+---------+
|   3 |       2 |
|   7 |       1 |
|   9 |       4 |
+-----+---------+
3 rows in set (0.00 sec)
Community
  • 1
  • 1
Krøllebølle
  • 2,878
  • 6
  • 54
  • 79

5 Answers5

2

NOTE: Not used PDO, just query string generation, can be switched easily

To check each version you can do an OR statement for each id but check first that the json is not empty first

$jsonData = json_decode($inputJson, true);
$jsonData = $jsonData['my_data'];
$string = 'select * from my_data where';
foreach($jsonData as $data) {
    $conditions[] = '(id='.$data['id'].' and version>'.$data['version'].')';
}
$string .=implode('or', $conditions);

result:

select * from my_data where (id=1 and version>0) or (id=2 and version>0)
ka_lin
  • 9,329
  • 6
  • 35
  • 56
  • The intention was to create the query string – ka_lin Nov 14 '15 at 17:08
  • yes, but don't forget to add a warning on SQL injection when you send answer with crafted unsafe sql query :P – Blag Nov 14 '15 at 17:11
  • This approach works fine and is faster than using a `TEMPORARY TABLE` approach for small data sets, see test results in my answer. For larger data sets this is not feasible, however, thus it might be a better to use a temporary table in those cases. – Krøllebølle Nov 21 '15 at 12:18
0
SELECT id,version FROM my_data WHERE `id` = $APP_ID AND `version` > $APP_VERSION;

Replace $APP_ID with an actual item ID and respectively $APP_ID with ID coming from incoming JSON.

0

something like that no ?

try{
  $bdd = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'user', 'password');
}
catch (Exception $e){
  die('Error : ' . $e->getMessage());
}

$output = array();

$req = $bdd->prepare('SELECT `id`, `version` FROM my_data WHERE `id` = :id AND `version` > :version');

foreach($yourJson as $object){
  $req->execute(array('id' => $object['id'], 'prixmax' => $object['version']));
  $data = $req->fetch();
  if(!empty($data))
    $output[] = $data;
}

echo $data
Blag
  • 5,818
  • 2
  • 22
  • 45
0

That you have to do with json_array_elements:

SELECT id, version 
FROM my_data AS md, json_array_elements(md.json_col->'version') AS jsonVersion
WHERE  version > jsonVersion->>'version';

json_col being the name of the JSON column.

I am providing few links might be it will helpful to you.

More details in this related answer:

How do I query using fields inside the new PostgreSQL JSON datatype?

More about the implicit CROSS JOIN LATERAL in the last paragraph of this related answer: PostgreSQL unnest() with element number

Advanced example

Query combinations with nested array of records in JSON datatype

Hope you will get solution.

Community
  • 1
  • 1
Santosh
  • 393
  • 2
  • 11
0

Main result for the below results and discussions: Using the multiple OR query (as suggested by @KA_lin) is faster for small data sets (n < 1000 or so). This approach scales badly for larger data sets however, so I will probably stick with using a query with the TEMPORARY TABLE approach below in case my data set should grow large in the future. The payload for this is not that high.

CREATE TEMPORARY TABLE my_data_virtual(id INTEGER NOT NULL, version TINYINT(3) NOT NULL);

INSERT INTO my_data_virtual VALUES
    (1,0), (2,1), (3,0), (4,0), (5,1),
    (6,0), (7,0), (8,1), (9,2), (10,1);

SELECT md.id, md.version
    FROM my_data AS md
    INNER JOIN my_data_virtual AS mdv
        ON md.id = mvd.id AND md.id > mvd.id;

I ran a series of tests using the MySQLdb and timeit modules in Python. I created 5 tables: test_100, test_500, test_1000, test_5000 and test_10000. All the databases were given a single table, data, which contained the following columns.

+-------------+---------+------+-----+---------+----------------+
| Field       | Type    | Null | Key | Default | Extra          |
+-------------+---------+------+-----+---------+----------------+
| id          | int(11) | NO   | PRI | NULL    | auto_increment |
| version     | int(11) | NO   |     | 0       |                |
| description | text    | YES  |     | NULL    |                |
+-------------+---------+------+-----+---------+----------------+

The tables in the databases were then filled with random versions from 0 to 5 and a semi-random amount of lorem ipsum text. The test_100.data table got 100 rows, the test_500.data table got 500 rows and so forth. I then ran test for both the query using nested OR statements and using a temporary table with all ids and random version between 0 and 5.

Results

Results for nested OR query. Number of repeats for each n was 1000.

+----------+-------------+-------------+-------------+-------------+-------------+
|          | n = 100     | n = 500     | n = 1000    | n = 5000    | n = 10000   |
+----------+-------------+-------------+-------------+-------------+-------------+
| max      | 0.00719     | 0.02213     | 0.04325     | 1.75707     | 8.91687     |
| min      | 0.00077     | 0.00781     | 0.02696     | 0.63565     | 5.29613     |
| median   | 0.00100     | 0.00917     | 0.02996     | 0.82732     | 5.92217     |
| average  | 0.00111     | 0.01001     | 0.03057     | 0.82540     | 5.89577     |
+----------+-------------+-------------+-------------+-------------+-------------+

Results for temporary table query. Number of repeats for each n was 1000.

+----------+-------------+-------------+-------------+-------------+-------------+
|          | n = 100     | n = 500     | n = 1000    | n = 5000    | n = 10000   |
+----------+-------------+-------------+-------------+-------------+-------------+
| max      | 0.06352     | 0.07192     | 0.08798     | 0.28648     | 0.26939     |
| min      | 0.02119     | 0.02027     | 0.03126     | 0.07677     | 0.12269     |
| median   | 0.03075     | 0.03210     | 0.043833    | 0.10068     | 0.15839     |
| average  | 0.03121     | 0.03258     | 0.044968    | 0.10342     | 0.16153     |
+----------+-------------+-------------+-------------+-------------+-------------+

It seems that using nested OR queries is faster up to about n = 1000. From there on, the the nested OR scales badly and the temporary table approach wins solidly. In my case I am likely to have a maximum of around 1000 rows, so it seems that I can choose between these two approaches relatively freely.

I will probably go for the temporary table approach in case my data set should become larger than expected. The payload is small in any case.

Notes

  • Since the timeit module in Python is a bit ticklish, the database is opened and closed for each run/repeat. This might produce some overhead to the timings.
  • The queries for the temporary table approach were done in 3 steps: 1 for creating the temporary, 1 for inserting the data and 1 for joining the tables.
  • The creation of the queries are not part of the timing; they are created outside of the Python timeit call.
  • Since both the versions in the inserted data and the random "client" data are randomly chosen between 0 and 5, it is likely that between 33 % and 50 % of the rows are selected. I have not verified this. This is not really the case I have, as the client data will at any point have almost the same data as the server.
  • I tried adding WHERE id IN (1,2,3...,10) on both the temporary table approach and the nested OR approach, but it neither sped things up nor slowed them down in any of the tests, except for the larger data sets and the multiple OR approach. Here, the times were slightly lower than without this WHERE statement.
Community
  • 1
  • 1
Krøllebølle
  • 2,878
  • 6
  • 54
  • 79