TL/DR;
Seriously, just add framing to your wire protocol. E.g. even HTTP responses do this (e.g. via the content length headers, and maybe chunked encoding)
UPDATE:
Instead of handrolling you can go with Boost JSON as I added in another answer
The first approach is flawed, because you are using "async_read_until" yet treat the operation as if it were synchronous.
The second problem is, neither json::parse
nor json::accept
can report the location of a complete/broken parse. This means that you really do need framing in your wire protocol, because you CANNOT detect message boundaries.
The rest of this answer will first dive in to expose how the limitations of the nlohmann::json
library make your task impossible¹.
So even though it's commendable for you to use an existing library, we look for alternatives.
Making It Work(?)
You could use the approach that Beast uses (http::read(s, buf, http::message<>
). That is: have a reference to the entire buffer.
flat_buffer buf;
http::request<http::empty_body> m;
read(s, buf, m); // is a SyncStream like socket
Here, read is a composed operation over the message as well as the buffer. This makes it easy to check the completion criteria. In our case, let's make a reader that also serves as a match-condition:
template <typename DynamicBuffer_v1>
struct JsonReader {
DynamicBuffer_v1 _buf;
nlohmann::json message;
JsonReader(DynamicBuffer_v1 buf) : _buf(buf) {}
template <typename It>
auto operator()(It dummy, It) {
using namespace nlohmann;
auto f = buffers_begin(_buf.data());
auto l = buffers_end(_buf.data());
bool ok = json::accept(f, l);
if (ok) {
auto n = [&] {
std::istringstream iss(std::string(f, l));
message = json::parse(iss);
return iss.tellg(); // detect consumed
}();
_buf.consume(n);
assert(n);
std::advance(dummy, n);
return std::pair(dummy, ok);
} else {
return std::pair(dummy, ok);
}
}
};
namespace boost::asio {
template <typename T>
struct is_match_condition<JsonReader<T>> : public boost::true_type { };
}
This is peachy and works on the happy path. But you run into big trouble on edge/error cases:
- you can't distinguish incomplete data from invalid data, so you MUST assume that unaccepted input is just incomplete (otherwise you would never wait for data to be complete)
- you will wait until infinity for data to become "valid" if it's just invalid or
- worse still: keep reading indefinitely, possibly running out of memory (unless you limit the buffer size; this could lead to a DoS)
- perhaps worst of all, if you read more data than the single JSON message (which you can not in general prevent in the context of stream sockets), the original message will be rejected due to "excess input". Oops
Testing It
Here's the test cases that confirm the analysis conclusios predicted:
Live On Compiler Explorer
#include <boost/asio.hpp>
#include <nlohmann/json.hpp>
#include <iostream>
#include <iomanip>
template <typename Buffer>
struct JsonReader {
static_assert(boost::asio::is_dynamic_buffer_v1<Buffer>::value);
Buffer _buf;
nlohmann::json message;
JsonReader() = default;
JsonReader(Buffer buf) : _buf(buf) {}
template <typename It>
auto operator()(It dummy, It) {
using namespace nlohmann;
auto f = buffers_begin(_buf.data());
auto l = buffers_end(_buf.data());
bool ok = json::accept(f, l);
if (ok) {
auto n = [&] {
std::istringstream iss(std::string(f, l));
message = json::parse(iss);
return iss.tellg(); // detect consumed
}();
_buf.consume(n);
assert(n);
//std::advance(dummy, n);
return std::pair(dummy, ok);
} else {
return std::pair(dummy, ok);
}
}
};
namespace boost::asio {
template <typename T>
struct is_match_condition<JsonReader<T>> : public boost::true_type { };
}
static inline void run_tests() {
std::vector<std::string> valid {
R"({})",
R"({"a":4, "b":5})",
R"([])",
R"([4, "b"])",
},
incomplete {
R"({)",
R"({"a":4, "b")",
R"([)",
R"([4, ")",
},
invalid {
R"(})",
R"("a":4 })",
R"(])",
},
excess {
R"({}{"a":4, "b":5})",
R"([]["a", "b"])",
R"({} bogus trailing data)",
};
auto run_tests = [&](auto& cases) {
for (std::string buf : cases) {
std::cout << "Testing " << std::left << std::setw(22) << buf;
bool ok = JsonReader { boost::asio::dynamic_buffer(buf) }
(buf.begin(), buf.end())
.second;
std::cout << " -> " << std::boolalpha << ok << std::endl;
if (ok && !buf.empty()) {
std::cout << " -- remaining buffer " << std::quoted(buf) << "\n";
}
}
};
std::cout << " ----- valid test cases \n";
run_tests(valid);
std::cout << " ----- incomplete test cases \n";
run_tests(incomplete);
std::cout << " ----- invalid test cases \n";
run_tests(invalid);
std::cout << " ----- excess input test cases \n";
run_tests(excess);
}
template <typename SyncReadStream, typename Buffer>
static void read(SyncReadStream& s, Buffer bufarg, nlohmann::json& message) {
using boost::asio::buffers_begin;
using boost::asio::buffers_end;
JsonReader reader{bufarg};;
read_until(s, bufarg, reader);
message = reader.message;
}
int main() {
run_tests();
}
Prints
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> false
Testing "a":4 } -> false
Testing ] -> false
----- excess input test cases
Testing {}{"a":4, "b":5} -> false
Testing []["a", "b"] -> false
Testing {} bogus trailing data -> false
Looking For Alternatives
You could roll your own as I did in the past:
Or we can look at another library that DOES allow us to either detect boundaries of valid JSON fragments OR detect and leave trailing input.
Hand-Rolled Approach
Here's a simplistic translation to more modern Spirit X3 of that linked answer:
// Note: first iterator gets updated
// throws on known invalid input (like starting with `]' or '%')
template <typename It>
bool tryParseAsJson(It& f, It l)
{
try {
return detail::x3::parse(f, l, detail::json);
} catch (detail::x3::expectation_failure<It> const& ef) {
throw std::runtime_error("invalid JSON data");
}
}
The crucial point is that this *in addition to return true/false will update the start iterator according to how far it consumed the input.
namespace JsonDetect {
namespace detail {
namespace x3 = boost::spirit::x3;
static const x3::rule<struct value_> value{"value"};
static auto primitive_token
= x3::lexeme[ x3::lit("false") | "null" | "true" ];
static auto expect_value
= x3::rule<struct expect_value_> { "expect_value" }
// array, object, string, number or other primitive_token
= x3::expect[&(x3::char_("[{\"0-9.+-") | primitive_token | x3::eoi)]
>> value
;
// 2.4. Numbers
// Note our spirit grammar takes a shortcut, as the RFC specification is more restrictive:
//
// However non of the above affect any structure characters (:,{}[] and double quotes) so it doesn't
// matter for the current purpose. For full compliance, this remains TODO:
//
// Numeric values that cannot be represented as sequences of digits
// (such as Infinity and NaN) are not permitted.
// number = [ minus ] int [ frac ] [ exp ]
// decimal-point = %x2E ; .
// digit1-9 = %x31-39 ; 1-9
// e = %x65 / %x45 ; e E
// exp = e [ minus / plus ] 1*DIGIT
// frac = decimal-point 1*DIGIT
// int = zero / ( digit1-9 *DIGIT )
// minus = %x2D ; -
// plus = %x2B ; +
// zero = %x30 ; 0
static auto number = x3::double_; // shortcut :)
// 2.5 Strings
static const x3::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
static auto char_ = ~x3::char_("\"\\") |
x3::char_(R"(\)") >> ( // \ (reverse solidus)
x3::char_(R"(")") | // " quotation mark U+0022
x3::char_(R"(\)") | // \ reverse solidus U+005C
x3::char_(R"(/)") | // / solidus U+002F
x3::char_(R"(b)") | // b backspace U+0008
x3::char_(R"(f)") | // f form feed U+000C
x3::char_(R"(n)") | // n line feed U+000A
x3::char_(R"(r)") | // r carriage return U+000D
x3::char_(R"(t)") | // t tab U+0009
x3::char_(R"(u)") >> _4HEXDIG ) // uXXXX U+XXXX
;
static auto string = x3::lexeme [ '"' >> *char_ >> '"' ];
// 2.2 objects
static auto member
= x3::expect [ &(x3::eoi | '"') ]
>> string
>> x3::expect [ x3::eoi | ':' ]
>> expect_value;
static auto object
= '{' >> ('}' | (member % ',') >> '}');
// 2.3 Arrays
static auto array
= '[' >> (']' | (expect_value % ',') >> ']');
// 2.1 values
static auto value_def = primitive_token | object | array | number | string;
BOOST_SPIRIT_DEFINE(value)
// entry point
static auto json = x3::skip(x3::space)[expect_value];
} // namespace detail
} // namespace JsonDetect
Obviously you put the implementation in a TU, but on Compiler Explorer we can't: Live On Compiler Explorer, using an adjusted JsonReader prints:
SeheX3Detector
==============
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> invalid JSON data
Testing "a":4 } -> true -- remaining `:4 }`
Testing ] -> invalid JSON data
----- excess input test cases
Testing {}{"a":4, "b":5} -> true -- remaining `{"a":4, "b":5}`
Testing []["a", "b"] -> true -- remaining `["a", "b"]`
Testing {} bogus trailing data -> true -- remaining ` bogus trailing data`
NlohmannDetector
================
----- valid test cases
Testing {} -> true
Testing {"a":4, "b":5} -> true
Testing [] -> true
Testing [4, "b"] -> true
----- incomplete test cases
Testing { -> false
Testing {"a":4, "b" -> false
Testing [ -> false
Testing [4, " -> false
----- invalid test cases
Testing } -> false
Testing "a":4 } -> false
Testing ] -> false
----- excess input test cases
Testing {}{"a":4, "b":5} -> false
Testing []["a", "b"] -> false
Testing {} bogus trailing data -> false
Note how we now achieved some of the goals.
- accepting trailing data - so we don't clobber any data after our message
- failing early on some inputs that cannot possibly become valid JSON
- However, we can't fix the problem of waiting indefinitely on /possibly/ incomplete valid data
- Interestingly, one of our "invalid" test cases was wrong (!). (It is always a good sign when test cases fail). This is because "a" is actually a valid JSON value on its own.
Conclusion
In the general case it is impossible to make such a "complete message" detection work without at least limiting buffer size. E.g. a valid input could start with a million spaces. You don't want to wait for that.
Also, a valid input could open a string, object or array², and not terminate that within a few gigabytes. If you stop parsing before hand you'll never know whether it was ultimately a valid message.
Though you'll inevitably have to deal with network timeout anyways you will prefer to be proactive about knowing what to expect. E.g. send the size of the payload ahead of time, so you can use boost::asio::transfer_exactly
and validate precisely what you expected to get.
¹ practically. If you don't care about performance, you could iteratively run accept
on increasing lengths of buffer
² god forbid, a number like 0000....00001 though that's subject to parser implementation differences