1

I posted earlier today this question related to the conversion of a JSON array into a table, and was quite lucky to find the solution after further search.

Now, and after searching more than the previous time, I'm still stuck (though I saw some entries in this forum, but they do not specifically resolve my problem).

There are some cases in which I need to respond to a request with the result of a select that has a variable number of records (could be thousands), each having about 20 columns.

Now the way I found to build a JSON out of the select (by adding FOR JSON AUTO) works very nicely and indeed creates an array of records, each having all the columns presided by the column name.

This, however, makes the result several times larger than needed (I'm thinking about network traffic, specially when it is not over a LAN).

To overcome this, I split the response into two, a Header and a Body, where the Header contains the list of the column names in the response (in the correct order) while the body contains, for each record, the list of values (matching the number and order of the Header).

Example:

If the source table would look like this:

  A     |      B      |         C
--------+-------------+--------------------
 11     |     22      |   2018-04-07 12:44
 33     |     44      |   2017-02-21 18:55
 55     |     66      |   2016-11-12 00:03

and the Body of the response should contain the values of columns "A" and "B" from a table, the response would look as follows:

{"Response": {
               "Header":["A","B","C"],
                "Body":[["11","22","2018-04-07 12:44"],
                        ["33","44","2017-02-21 18:55"],
                        ["55","66","2016-11-12 00:03"]
                       ]
             }
}

Unfortunately, I'm not finding a way to get the contents of the Body without the "A","B" and "C" names.

Update

I want to stress the fact that the different columns within the table record may be of different types, so I would convert them all to strings. See the updated example table and the expected result.

halfer
  • 19,824
  • 17
  • 99
  • 186
FDavidov
  • 3,505
  • 6
  • 23
  • 59
  • You are asking for a completely different result, not the JSON representation of the table data. A *meaningful* result woudl be `[{"A":11,"B":22}, {"A":33","B":44},...]`. Looks like you are trying to construct some *web page response* in *SQL*? Why not do that on the service itself? I won't say `REST service` because that shape is unsuitable for REST. It only makes sense as data for producing table reports – Panagiotis Kanavos Jul 18 '18 at 12:24
  • @PanagiotisKanavos, thank you for your comment. What I'm looking for is to get, for each table record, an array of strings (I'll update the question to represent that - thanks for triggering this correction). – FDavidov Jul 18 '18 at 12:27
  • Do so on the client. The result will defeat the very purpose of JSON though. *Instead* of the representation of objects you'll have something that will require extra processing to convert it back to objects – Panagiotis Kanavos Jul 18 '18 at 12:28
  • Network bandwidth shouldn't be a concern - after all, responses are compressed. If you have a **lot** of data though, perhaps you *shouldn't* be using JSON at all. Instead of inventing a new format check protobuf, Avro or one of the many big data formats – Panagiotis Kanavos Jul 18 '18 at 12:32
  • @PanagiotisKanavos, well, I'm used (for many years already) to consider everything, including NW load. As per your assumption that I'm inventing something new, I must say that I completely disagree. JSON is great for data transfer, and the only question is how to best **_MODEL_** the data to ease the load on all the parts forming the final system (and yes, in some cases you can distribute the processing load all over the client applications; in my case, the JavaScript handing large amount of data -several MB- in the above mentioned format completes the task in a small action of a sec.). – FDavidov Jul 18 '18 at 12:43
  • `FOR JSON` cannot generate plain jane arrays. You can generate them with [various workarounds](https://stackoverflow.com/q/37708638/4137916), but none of those are pretty. Meanwhile, client code is perfectly poised to do this: an SQL data reader gives you column names, and the row values without column names. Turning this into the JSON you want is almost trivial, assuming a client that itself has access to JSON libraries. (As a bonus, this minimizes network traffic, because native TDS values are smaller than a JSON string.) – Jeroen Mostert Jul 18 '18 at 12:50
  • @FDavidov that's what Avro and Protobuf are about. They are built specifically to handle big loads, far bigger than a few MBs. As for modeling to ease the load on all the parts - that's exactly the opposite here. I work for the 4th largest OTA (online travel agency) in Europe. There's nothing worse than *custom* compressions or separating tags from values like eg Amadeus does. The responses *are still* several MB in size. It increases the processing load a LOT without significant reductions in size - after all the responses are gzip compressed to similar sizes! – Panagiotis Kanavos Jul 18 '18 at 13:26
  • @FDavidov the result though is that you *CAN'T* process the data as it comes. You can't even use the response once you have it *and* paid the deserialization cost. You have to keep it in memory while you transform it to bring it back to a usable form. This is measured in *extra servers* needed to serve the same traffic. – Panagiotis Kanavos Jul 18 '18 at 13:29
  • @FDavidov as for compression, the gzip compression used by web servers means that 6MB of text may end up as 200KB of data, depending on the server's compression settings. ASP.NET Core added [Brotli compression](https://blogs.msdn.microsoft.com/dotnet/2017/07/27/introducing-support-for-brotli-compression/) with even better performance. That's a *lot* better than any savings you can get by eliminating headers – Panagiotis Kanavos Jul 18 '18 at 13:37

2 Answers2

1

As @Jeroen-Mostert notes, this is pretty simple to do in procedural code. You can even have SQL Server do it using SQL CLR. Because this shape is not natural for FOR JSON queries, a CLR-based solution is probably be better than a TSQL one.

Below concatenates the results into a single string, but you could change that to stream the results over multiple rows (like FOR JSON queries do), or add in GZip compression.

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;

public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void RunQuery (string sql)
    {
        /*
         {"Response":{"Header":["A","B","C"],
                "Body":[["11","22","2018-04-07 12:44"],
                        ["33","44","2017-02-21 18:55"],
                        ["55","66","2016-11-12 00:03"]
                       ]
             }
         }
         * */
        using (var con = new SqlConnection("Context Connection=true"))
        {
            var sb = new StringBuilder();
            con.Open();
            var cmd = con.CreateCommand();
            cmd.CommandText = sql;
            using (var rdr = cmd.ExecuteReader())
            {
                sb.Append("{\"Response\":{\"Header\":[");
                for (int i = 0; i < rdr.FieldCount; i++)
                {
                    var fn = rdr.GetName(i);
                    sb.Append('"').Append(fn).Append('"');
                    if (i + 1 < rdr.FieldCount)
                        sb.Append(',');
                }
                sb.Append("],\"Body\":[");

                //SqlContext.Pipe.Send(sb.ToString());

                if (rdr.Read())
                {
                    while (true)
                    {

                        sb.Append('[');
                        for (int i = 0; i < rdr.FieldCount; i++)
                        {
                            var val = rdr[i].ToString();
                            sb.Append('"').Append(val).Append('"');
                            if (i + 1 < rdr.FieldCount)
                                sb.Append(',');
                        }
                        sb.Append(']');
                        if (rdr.Read())
                        {
                            sb.Append(',');
                        }
                        else
                        {
                            break;
                        }

                    }
                }
                sb.Append("]}}");

                var md = new SqlMetaData("JSON", SqlDbType.NVarChar,-1);


                var r = new SqlDataRecord(md);
                r.SetString(0, sb.ToString());
                SqlContext.Pipe.SendResultsStart(r);
                SqlContext.Pipe.SendResultsRow(r);
                SqlContext.Pipe.SendResultsEnd();

            }
        }
    }
}
David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Except that SQLCLR is effectively deprecated in SQL Server 2017 due to the security restrictions. One of the reasons is ill-behaved code that caches a lot of data, stealing it from the servers buffers like *this* code does. For 6MB worth of data this code would have to cache at least that much in the StringBuilder. The actual behaviour would be far worse - the StringBuilder would have to reallocate its memory several times, resulting in severe memory waste. – Panagiotis Kanavos Jul 18 '18 at 13:42
  • Both values and column names ought to be escaped to ensure the result is always valid JSON. (Obviously this is usually more relevant for values than for column names, but still.) – Jeroen Mostert Jul 18 '18 at 13:44
  • On the other hand, generating this on the *client*, an MVC or Web API controller, would be far easier and cheaper. One could use libraries like JSON.NET to easily generate individual elements and even emit each record to the response stream instead of caching everything before responding. – Panagiotis Kanavos Jul 18 '18 at 13:46
  • David, thank you for the proposed solution (and code of course). I thought of attaching a JAVA stored procedure, but (independently of what @PanagiotisKanavos states about depreciation), I would rather **pay** the price on transferred data size than add a sort of _foreign_ element to the overall solution. Still, will keep this proposal in mind in case it turns out to be the best. Once again, thanks!! – FDavidov Jul 18 '18 at 13:47
  • Even then, it would probably be easier and faster to simply use a Protobuf formatter like [this one in ASP.NET Core](https://damienbod.com/2017/06/30/using-protobuf-media-formatters-with-asp-net-core/) – Panagiotis Kanavos Jul 18 '18 at 13:48
  • @FDavidov check the Protobuf formatter first. How are you going to use that data anyway? Export it to a file or return it in an HTTP response? If it's for exporting, you can write a small program to generate and export the data to a file. If it's for HTTP, especially ASP.NET Core, you could add multiple formatters to easily support protobuf *and* your own format and let the client request what it wants – Panagiotis Kanavos Jul 18 '18 at 13:50
  • The platform of the project includes: JavaScript-based client (AngularJS + JQuery + Bootstrap), Java Servlet (broker between the client and the DB) and SQL Server (where access to data takes place ONLY via Stored Procedures). I'm not sure it would be wise to include some other link to the chain (you know the saying: _"The simplest way to weaken a chain is to make it longer"_). – FDavidov Jul 18 '18 at 14:12
  • 1
    @FDavidov inventing a new file format is like adding another untested link. The servlet can easily generate any format, including what you posted. – Panagiotis Kanavos Jul 18 '18 at 15:05
  • @PanagiotisKanavos true, but that would make the Servlet "**aware**" of the JSON structure (for each interaction it could be different), making it yet another piece that would need to be maintained. In the current implementation, the Servlet did not undergo any change since its first implementation, and that's something I (personally) consider a great thing. – FDavidov Jul 19 '18 at 04:59
  • "make the Servlet "aware" of the JSON structure" not so. Like in the code I posted the JS client can send the query, and the Java Servlet can discover the result set metadata for the result, and dynamically generate the JSON structure from that. – David Browne - Microsoft Jul 19 '18 at 13:30
  • David, I had an additional consideration of the problem... The issue is mainly focused on the fact that I practice a very **explicit** naming for variables, tables and column names (for instance, a timestamp column would be named `Decommissioning_Timestamp` (imagine a JSON with thousands of records, all having this name in it). Instead, I could simply NUMBER the columns (like `{"1":"2018-07-22 13:48",...}`. This would of course required the sever and client application to **_know_** the mapping (number vs. name), but it would significantly contribute to maintain the overall size small. – FDavidov Aug 10 '18 at 04:45
1

This, however, makes the result several times larger than needed (I'm thinking about network traffic, specially when it is not over a LAN).

Given that:

  1. your goal is to reduce network traffic, and
  2. you are using a servlet to handle the DB communication
  3. there are multiple datatypes involved

you can try...

  1. simply not using JSON to return the result set to the servlet. Converting the result set to JSON is resource-intensive, requiring additional CPU, RAM, and time. And while JSON is a more efficient text format than XML, it is not necesarilly more efficient than sticking with the native TDS format. Consider that integers of 0 to 3 digits will be more efficient as text, 4 digits will be equally efficient, and 5 digits or more are less efficient. So for INT fields, which format is more efficient depends on the actual results. On the other hand, DATETIME fields are 8 bytes natively, but when serialized as text for JSON, it becomes 23 bytes. This, of course, assumes that you are converting the JSON to VARCHAR. If you are unable to convert to VARCHAR to return the JSON to the servlet due to having Unicode strings in the results, then those string dates take up 46 bytes each (SQL Server uses UTF-16 only for Unicode, there is no UTF-8 option). And that also changes the difference between INT values such that only values of 0 or 1 digits is more efficient as text, 2 digits is equivalent, and 3 or more digits are less efficient.

    The overall point being: you need to thoroughly test both approach because, I (and apparently others on here) suspect that you would, at best, end up with an equally efficient transport, yet would have paid the price in terms of having more convoluted code and require more system resources. Hence a net-negative. But, more than likely this approach will be slower, in addition to paying that price for it.

    So, as has already been suggested in various comments by @PanagiotisKanavos, @JeroenMostert, and @DavidBrowne: just convert to JSON in the servlet.

  2. compressing the returned, standard JSON via the built-in COMPRESS function (introduced in SQL Server 2016). By simply adding a single function to the query you can reduce the returned JSON by over 90%. For example:

    DECLARE @Results VARCHAR(MAX);
    SET @Results = (SELECT * FROM sys.objects FOR JSON AUTO);
    SELECT @Results AS [JSON],
           DATALENGTH(@Results) AS [UncompressedBytes],
           DATALENGTH(COMPRESS(@Results)) AS [CompressedBytes];
    

    returns:

    [{"name":"sysrscols","object_id":3,"schema_id":4,"parent_object_id":0,"type":"S","type_desc":"SYSTEM_TABLE","create_date":"2017-08-22T19:38:02.860","modify_date":"2017-08-22T19:38:02.867","is_ms_shipped":true,"is_published":false,"is_schema_published":false},...

    29521

    2594

    This dramastically reduces the network traffic without convoluting the code or coming up with a proprietary format.

    The compression is done using GZip, which appears to be rather easy to handle in Java:

    Java GZIP Example – Compress and Decompress File

    Proving that you really can have your JSON and compress it too 😸


P.S. Regarding your comment, on David's answer, of:

I thought of attaching a JAVA stored procedure

No, this is not an option with SQL Server.

P.P.S. Regarding the comment that SQLCLR is deprecated, even just "effectively", please see my post:

SQLCLR vs SQL Server 2017, Part 8: Is SQLCLR Deprecated in Favor of Python or R (sp_execute_external_script)?

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
  • Salomon, thanks for your post. About being "resource-intensive" (conversion to JSON), I though that was the case but, to my surprise, Microsoft made a great job when coding the new functions (SQL Server 2016). for instance, converting a 40,000 records table into a JSON array (almost 7MB string!!!) takes about 0.15 second. As far as I'm concerned, this is quite efficient indeed. – FDavidov Aug 15 '18 at 12:32
  • @FDavidov Even if it is rather efficient, that still doesn't mean that it is _more_ efficient than the native transport method. But again, if you are inclined to go the JSON route, then my option #2 would still apply. – Solomon Rutzky Aug 15 '18 at 13:08
  • Solomon, thank you again for your inputs. As for option #2, the JAVA servlet sits next to the DB, so I would be saving bandwidth only within the server's connections, not at the connection within the servers farm and the client. I'll stay with the current approach for the time being and see what happens once we go live. Again, I appreciate your time and effort. Cheers!!! – FDavidov Aug 16 '18 at 15:07
  • @FDavidov While I was not expecting that issue, I am not sure that it is not solvable easily enough. It seems like you can pass on the GZipped binary (possibly Base64 encoded first) to the browser. Take a look at this S.O. answer, [JavaScript implementation of Gzip](https://stackoverflow.com/a/5633128/577765), and this GitHub project, [gzip-js](https://github.com/beatgammit/gzip-js). And there might be other, similar projects / answers that are even better. – Solomon Rutzky Aug 16 '18 at 15:23
  • Solomon, Base64, by definition, makes a binary string at least 25% bigger (or perhaps 33%? Not sure about that). IN any case, ZIP -> Convert (to Base64) -> transport -> Convert back (to binary) -> UNZIP... Hummmm... Don't know about that. In any case, I'm up-voting your answer as it at give ideas also to others. One more time, thank you Sir!!! – FDavidov Aug 18 '18 at 11:33
  • @FDavidov Is it possible to transmit the binary as is, without Base64 encoding? Either way, even if there is a 33% increase, that is based on the size that is already 90% smaller. So if starting at 100 bytes, GZipped is then 10 bytes, then Base64 encode that and it will be 13 - 14 bytes, which is still 86 - 87% smaller than the original. Or am I missing something? – Solomon Rutzky Aug 18 '18 at 14:28