Reading JSON file with C++ and BOOST

Question

An HTTP server sends me a JSON response (a string) like this :

{
    "folders" :
    [{
            "id" : 109,
            "parent_id" : 110,
            "path" : "\/1\/105\/110\/"
        },
        {
            "id" : 110,
            "parent_id" : 105,
            "path" : "\/1\/105\/"
        }
    ],

    "files" :
    [{
            "id" : 26,
            "parent_id" : 105,
            "name" : "picture.png",
            "hash" : "md5_hash",
            "path" : "\/1\/105\/"
        },
        {
            "id" : 25,
            "parent_id" : 110,
            "name" : "another_picture.jpg",
            "hash" : "md5_hash",
            "path" : "\/1\/105\/110\/"
        }
    ]
}

I want to compare this "tree of a remote folder" with a local folder tree (for example a string vector containing location of my local files), so I thought in converting this JSON on a map of (string, vector ( map(string, string) ) ) (I don't know if this is possible).

I'm developing a tool to synchronize files between a local and a remote folder, so I'm using boost to list a local folder, and I want to compare the local listing with the remote listing (the JSON response) to generate actions (download missing files that dont exist in the local folder, uploading files that dont exist in the remote folder).

with this function I found on another question :

void print(boost::property_tree::ptree const& pt)
{
    using boost::property_tree::ptree;
    ptree::const_iterator end = pt.end();
    for (ptree::const_iterator it = pt.begin(); it != end; ++it)
    {
        std::cout << it->first << ": " << it->second.get_value<std::string>() << std::endl;
        print(it->second);
    }
}

I succeeded in printing something like this :

folders:
:
id: 109
parent_id: 110
name: 2011_pictures
:
id: 110
parent_id: 105
name: Aminos
files:
id: 26
parent_id: 105
name: logo.png
:
id: 5
parent_id: 109
name: me.jpg

I want to know if it is possible to generate with this result a map<string, vector <map<string,string> > >, it will have 2 keys : "folders" and "files" and with those 2 keys we can access a vector of type map that contains informations for each object (file or folder). If this is feasible, it will reduce the complexity of the task (comparing two folders listing)

example : T["folder"][0]["id"] would return "109" ; T["files"][0]["name"] would return "logo.png"

UPDATE : this question is old but I want to give an advice : use RAPIDJSON whenever you want to deal with Json under C++.

You could use property_tree from boost or the json spirit parser. I am sure that you will find some answered question here on SO. — mkaes, Jan 02 '15 at 17:47
Possible duplicate of [Reading json file with boost](http://stackoverflow.com/questions/15206705/reading-json-file-with-boost). — jww, Jan 02 '15 at 18:21

score 4 · Accepted Answer · edited May 23 '17 at 12:15

Because the data structure in the other answer was deemed "very complex" and the target data structure was suggested to be:

struct Data {
    struct Folder { int id, parent_id; std::string path; };
    struct File   { int id, parent_id; std::string path, name, md5_hash; };

    using Folders = std::vector<Folder>;
    using Files   = std::vector<File>;

    Folders folders;
    Files   files;
};

I ended up writing a transformation from generic "JSON" to that data structure (see the other answer: Reading JSON file with C++ and BOOST).

However, perhaps the OP will be more pleased if we "skip the middle man" and parse the JSON specifically into the shown Data structure. This "simplifies" the grammar making it specific for this type of document only:

start    = '{' >> 
           (folders_ >> commasep) ^
           (files_ >> commasep)
         >> '}';

folders_ = prop_key(+"folders") >> '[' >> -(folder_ % ',') >> ']';
files_   = prop_key(+"files")   >> '[' >> -(file_   % ',') >> ']';

folder_  = '{' >> (
                (prop_key(+"id")        >> int_  >> commasep) ^
                (prop_key(+"parent_id") >> int_  >> commasep) ^
                (prop_key(+"path")      >> text_ >> commasep)
            ) >> '}';
file_    = '{' >> (
                (prop_key(+"id")        >> int_  >> commasep) ^
                (prop_key(+"parent_id") >> int_  >> commasep) ^
                (prop_key(+"path")      >> text_ >> commasep) ^
                (prop_key(+"name")      >> text_ >> commasep) ^
                (prop_key(+"hash")      >> text_ >> commasep)
            ) >> '}';

prop_key = lexeme ['"' >> lazy(_r1) >> '"'] >> ':';
commasep = &char_('}') | ',';

This grammar allows

insignificant whitespace,
re-ordering of properties within objects
and omitted object properties

Benefits:

early checking of property value types
lower compile times
less code indeed: 37 fewer LoC (not counting the sample JSON lines that's ~22%)

That last benefit has a flip side: if ever you want to read slightly different JSON, now you need to muck with the grammar instead of just writing a different extraction/transform. At 37 lines of code, my preference is with the other answer but I'll leave it to you to decide.

Here's the same demo program using this grammar directly:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;

static std::string const sample = R"(
    {
        "folders" :
        [{
                "id" : 109,
                "parent_id" : 110,
                "path" : "\/1\/105\/110\/"
            },
            {
                "id" : 110,
                "parent_id" : 105,
                "path" : "\/1\/105\/"
            }
        ],

        "files" :
        [{
                "id" : 26,
                "parent_id" : 105,
                "name" : "picture.png",
                "hash" : "md5_hash",
                "path" : "\/1\/105\/"
            },
            {
                "id" : 25,
                "parent_id" : 110,
                "name" : "another_picture.jpg",
                "hash" : "md5_hash",
                "path" : "\/1\/105\/110\/"
            }
        ]
    })";

struct Data {
    struct Folder { int id, parent_id; std::string path; };
    struct File   { int id, parent_id; std::string path, name, md5_hash; };

    using Folders = std::vector<Folder>;
    using Files   = std::vector<File>;

    Folders folders;
    Files   files;
};

BOOST_FUSION_ADAPT_STRUCT(Data::Folder, (int,id)(int,parent_id)(std::string,path))
BOOST_FUSION_ADAPT_STRUCT(Data::File,   (int,id)(int,parent_id)(std::string,path)(std::string,name)(std::string,md5_hash))
BOOST_FUSION_ADAPT_STRUCT(Data,         (Data::Folders,folders)(Data::Files,files))

namespace folder_info { // adhoc JSON parser

    template <typename It, typename Skipper = qi::space_type>
    struct grammar : qi::grammar<It, Data(), Skipper>
    {
        grammar() : grammar::base_type(start) {
            using namespace qi;

            start    = '{' >> 
                       (folders_ >> commasep) ^
                       (files_ >> commasep)
                     >> '}';

            folders_ = prop_key(+"folders") >> '[' >> -(folder_ % ',') >> ']';
            files_   = prop_key(+"files")   >> '[' >> -(file_   % ',') >> ']';

            folder_  = '{' >> (
                            (prop_key(+"id")        >> int_  >> commasep) ^
                            (prop_key(+"parent_id") >> int_  >> commasep) ^
                            (prop_key(+"path")      >> text_ >> commasep)
                        ) >> '}';
            file_    = '{' >> (
                            (prop_key(+"id")        >> int_  >> commasep) ^
                            (prop_key(+"parent_id") >> int_  >> commasep) ^
                            (prop_key(+"path")      >> text_ >> commasep) ^
                            (prop_key(+"name")      >> text_ >> commasep) ^
                            (prop_key(+"hash")      >> text_ >> commasep)
                        ) >> '}';

            prop_key = lexeme ['"' >> lazy(_r1) >> '"'] >> ':';
            commasep = &char_('}') | ',';

            ////////////////////////////////////////
            // Bonus: properly decoding the string:
            text_   = '"' >> *ch_ >> '"';

            ch_ = +(
                    ~char_("\"\\")) [ _val += _1 ] |
                       qi::lit("\x5C") >> (               // \ (reverse solidus)
                       qi::lit("\x22") [ _val += '"'  ] | // "    quotation mark  U+0022
                       qi::lit("\x5C") [ _val += '\\' ] | // \    reverse solidus U+005C
                       qi::lit("\x2F") [ _val += '/'  ] | // /    solidus         U+002F
                       qi::lit("\x62") [ _val += '\b' ] | // b    backspace       U+0008
                       qi::lit("\x66") [ _val += '\f' ] | // f    form feed       U+000C
                       qi::lit("\x6E") [ _val += '\n' ] | // n    line feed       U+000A
                       qi::lit("\x72") [ _val += '\r' ] | // r    carriage return U+000D
                       qi::lit("\x74") [ _val += '\t' ] | // t    tab             U+0009
                       qi::lit("\x75")                    // uXXXX                U+XXXX
                            >> _4HEXDIG [ append_utf8(qi::_val, qi::_1) ]
                    );

            BOOST_SPIRIT_DEBUG_NODES((files_)(folders_)(file_)(folder_)(start)(text_))
        }
    private:
        qi::rule<It, Data(),            Skipper> start;
        qi::rule<It, Data::Files(),     Skipper> files_;
        qi::rule<It, Data::Folders(),   Skipper> folders_;
        qi::rule<It, Data::File(),      Skipper> file_;
        qi::rule<It, Data::Folder(),    Skipper> folder_;
        qi::rule<It, void(const char*), Skipper> prop_key;

        qi::rule<It, std::string()> text_, ch_;
        qi::rule<It> commasep;

        struct append_utf8_f {
            template <typename...> struct result { typedef void type; };
            template <typename String, typename Codepoint>
            void operator()(String& to, Codepoint codepoint) const {
                auto out = std::back_inserter(to);
                boost::utf8_output_iterator<decltype(out)> convert(out);
                *convert++ = codepoint;
            }
        };
        boost::phoenix::function<append_utf8_f> append_utf8;
        qi::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
    };

    template <typename Range, typename It = typename boost::range_iterator<Range const>::type>
    Data parse(Range const& input) {
        grammar<It> g;

        It first(boost::begin(input)), last(boost::end(input));
        Data parsed;
        bool ok = qi::phrase_parse(first, last, g, qi::space, parsed);

        if (ok && (first == last))
            return parsed;

        throw std::runtime_error("Remaining unparsed: '" + std::string(first, last) + "'");
    }
}

int main()
{
    auto parsed = folder_info::parse(sample);

    for (auto& e : parsed.folders) 
        std::cout << "folder:\t" << e.id << "\t" << e.path << "\n";
    for (auto& e : parsed.files) 
        std::cout << "file:\t"   << e.id << "\t" << e.path << "\t" << e.name << "\n";
}

Output:

folder: 109 /1/105/110/
folder: 110 /1/105/
file:   26  /1/105/ picture.png
file:   25  /1/105/110/ another_picture.jpg

Hello, Thank you very much for your efforts, that's very impressive, but in fact, in my JSON there is other informations for folder or file objects (like the date of creation, modification, the owner, the size etc...) but the most important for me are those I have put in the example (name, pid, id etc..), so do you think that your code will work with that case ? I will wait your answer, because I will be obliged to work with the result of the "print" function and try to extract and put the informations about files or folders in a vector for example. once again Thank you and have a good day ! — Aminos, Jan 04 '15 at 13:02
Oh well. I'm pretty surprised that you accepted this answer then, since the other answer does exactly the same, except it **does** accept and ignore "other" JSON content. Did you miss the update that defined `extract_from`? It uses exactly the same data structure - the one you suggested in the question. — sehe, Jan 04 '15 at 16:41
Out of pure evil curiosity, here's a test that mixes the grammar rules to achieve a "hybrid" approach, **[Live On Coliru](http://coliru.stacked-crooked.com/a/bef6f8422cb48282)**. It leads to **[ridiculous explosion](http://paste.ubuntu.com/9671836/)** in the grammar rules, is very hard to maintain/test (I spent a handsome ~45 minutes to debug the correct operation and I don't think I covered all cases). ... — sehe, Jan 04 '15 at 16:48
... It is up ~40 lines of code. Oh, and the runtime performance is abysmal (you get [properly ~10,000 lines of debug output](http://coliru.stacked-crooked.com/a/c4b9239a3a7897c1) which shows the ridiculous amount of backtracking). Finally, you get spurious entries in the `files` and `folders` vectors, which you need to manually filter out for (see `main`). — sehe, Jan 04 '15 at 16:50
So all in all, **[the other answer](http://stackoverflow.com/a/27749360/85371)** is a **clear** win here, were the separation of concerns keeps all the accidental complexity at bay. There's none of these downsides — sehe, Jan 04 '15 at 16:52
Under Visual Studio 2010, the compilation failed, is that because you use some stuff related to the latest version of C++ ? the first error is located here : namespace qd_json {using text = std::string;<==== — Aminos, Jan 04 '15 at 18:58
Just use typedefs. Look at a page like [wikipedia:c++11](http://en.wikipedia.org/wiki/C%2B%2B11) for any c++11 constructs not yet supported in that version. — sehe, Jan 04 '15 at 19:19
@Aminos this one might not be obvious: `template struct result { typedef void type; };` will work in c++03 — sehe, Jan 04 '15 at 19:46

sehe · Answer 2 · 2015-01-06T13:52:15.557

Disclaimer: The sample below is not a full blown JSON parser. Consider using a library that supports your needs. You can see a more evolved JSON parser here https://github.com/sehe/spirit-v2-json

A quick-and-dirty Spirit grammar (assuming you don't need too much conformance) would be:

    text_   = '"' >> raw [*('\\' >> char_ | ~char_('"'))] >> '"'; // ¹
    value_  = null | bool | text_ | double_ | object_ | array_; // ²
    member_ = text_ >> ':' >> value_;
    object_ = '{' >> -(member_ % ',') >> '}';
    array_  = '[' >> -(value_  % ',') >> ']';

    // ¹ as a bonus I added utf8 escape decoding in the full sample
    // ² as another bonus I threw in the missing `null` and `bool` types

Which translates into C++ types without further effort using an AST like:

using text   = std::string;
using value  = boost::make_recursive_variant<
        null,
        bool,
        text,                                      // "string" (roughly!)
        double,                                    // number
        std::map<text, boost::recursive_variant_>, // object
        std::vector<boost::recursive_variant_>     // array
    >::type;
using member = std::pair<text, value>;
using object = std::map<text, value>;
using array  = std::vector<value>;

If you have two qd_json::value objects, you can just compare them:

qd_json::value local_tree, remote_tree;
if (local_tree == remote_tree)
{
    std::cout << "the tree is unchanged\n";
}

Here's a demo program:

Updated Demo

The demonstration was updated to show you how to get to the "user-friendly" data structure you suggested in the edit of the question:

int main() {
    auto json = qd_json::parse(sample);

    // extract into user friendly datastructure from the question:
    auto extracted = Data::extract_from(json);

    for (auto& e : extracted.folders) std::cout << "folder:\t" << e.id << "\t" << e.path << "\n";
    for (auto& e : extracted.files)   std::cout << "file:\t"   << e.id << "\t" << e.path << "\t" << e.name << "\n";
}

Live On Coliru

#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <map>

namespace qi = boost::spirit::qi;

static std::string const sample = R"(
    {
        "folders" :
        [{
                "id" : 109,
                "parent_id" : 110,
                "path" : "\/1\/105\/110\/"
            },
            {
                "id" : 110,
                "parent_id" : 105,
                "path" : "\/1\/105\/"
            }
        ],

        "files" :
        [{
                "id" : 26,
                "parent_id" : 105,
                "name" : "picture.png",
                "hash" : "md5_hash",
                "path" : "\/1\/105\/"
            },
            {
                "id" : 25,
                "parent_id" : 110,
                "name" : "another_picture.jpg",
                "hash" : "md5_hash",
                "path" : "\/1\/105\/110\/"
            }
        ]
    })";

namespace qd_json { // quick and dirty JSON handling
    struct null {
        bool operator==(null) const { return true; }
    };

    inline static std::ostream& operator<<(std::ostream& os, null) { return os << "null"; }

    using text   = std::string;
    using value  = boost::make_recursive_variant<
            null,
            text,                                      // "string" (roughly!)
            double,                                    // number
            std::map<text, boost::recursive_variant_>, // object
            std::vector<boost::recursive_variant_>,    // array
            bool
        >::type;
    using member = std::pair<text, value>;
    using object = std::map<text, value>;
    using array  = std::vector<value>;

    template <typename It, typename Skipper = qi::space_type>
    struct grammar : qi::grammar<It, value(), Skipper>
    {
        grammar() : grammar::base_type(value_) {
            using namespace qi;

            text_   = '"' >> raw [*('\\' >> char_ | ~char_('"'))] >> '"';
            null_   = "null" >> attr(null{});
            bool_   = "true" >> attr(true) | "false" >> attr(false);
            value_  = null_ | bool_ | text_ | double_ | object_ | array_;
            member_ = text_ >> ':' >> value_;
            object_ = '{' >> -(member_ % ',') >> '}';
            array_  = '[' >> -(value_  % ',') >> ']';

            ////////////////////////////////////////
            // Bonus: properly decoding the string:
            text_   = lexeme [ '"' >> *ch_ >> '"' ];

            ch_ = +(
                    ~char_("\"\\")) [ _val += _1 ] |
                       qi::lit("\x5C") >> (               // \ (reverse solidus)
                       qi::lit("\x22") [ _val += '"'  ] | // "    quotation mark  U+0022
                       qi::lit("\x5C") [ _val += '\\' ] | // \    reverse solidus U+005C
                       qi::lit("\x2F") [ _val += '/'  ] | // /    solidus         U+002F
                       qi::lit("\x62") [ _val += '\b' ] | // b    backspace       U+0008
                       qi::lit("\x66") [ _val += '\f' ] | // f    form feed       U+000C
                       qi::lit("\x6E") [ _val += '\n' ] | // n    line feed       U+000A
                       qi::lit("\x72") [ _val += '\r' ] | // r    carriage return U+000D
                       qi::lit("\x74") [ _val += '\t' ] | // t    tab             U+0009
                       qi::lit("\x75")                    // uXXXX                U+XXXX
                            >> _4HEXDIG [ append_utf8(qi::_val, qi::_1) ]
                    );

            BOOST_SPIRIT_DEBUG_NODES((text_)(value_)(member_)(object_)(array_)(null_)(bool_))
        }
    private:
        qi::rule<It, text()>            text_, ch_;
        qi::rule<It, null()>            null_;
        qi::rule<It, bool()>            bool_;
        qi::rule<It, value(),  Skipper> value_;
        qi::rule<It, member(), Skipper> member_;
        qi::rule<It, object(), Skipper> object_;
        qi::rule<It, array(),  Skipper> array_;

        struct append_utf8_f {
            template <typename...> struct result { typedef void type; };
            template <typename String, typename Codepoint>
            void operator()(String& to, Codepoint codepoint) const {
                auto out = std::back_inserter(to);
                boost::utf8_output_iterator<decltype(out)> convert(out);
                *convert++ = codepoint;
            }
        };
        boost::phoenix::function<append_utf8_f> append_utf8;
        qi::uint_parser<uint32_t, 16, 4, 4> _4HEXDIG;
    };

    template <typename Range, typename It = typename boost::range_iterator<Range const>::type>
    value parse(Range const& input) {
        grammar<It> g;

        It first(boost::begin(input)), last(boost::end(input));
        value parsed;
        bool ok = qi::phrase_parse(first, last, g, qi::space, parsed);

        if (ok && (first == last))
            return parsed;

        throw std::runtime_error("Remaining unparsed: '" + std::string(first, last) + "'");
    }

}

#include <boost/range/algorithm.hpp>
#include <boost/range/adaptors.hpp>
#include <boost/range/algorithm_ext/push_back.hpp>

struct Data {
    struct Folder { int id, parent_id; std::string path; };
    struct File   { int id, parent_id; std::string path, name, md5_hash; };

    using Folders = std::vector<Folder>;
    using Files   = std::vector<File>;

    Folders folders;
    Files   files;

    static Data extract_from(qd_json::value const& json) {
        using namespace boost::adaptors;

        return {
            boost::copy_range<Folders>(arr(obj(json).at("folders")) | transformed(obj) | transformed(&Data::extract_folder)),
            boost::copy_range<Files>  (arr(obj(json).at("files"))   | transformed(obj) | transformed(&Data::extract_file)),
        };
    }
 private:
    static Folder extract_folder(qd_json::object const& obj) {
        return {
            id   (obj.at("id")),
            id   (obj.at("parent_id")),
            text (obj.at("path"))
        };
    }
    static File extract_file(qd_json::object const& obj) {
        return {
            id   (obj.at("id")),
            id   (obj.at("parent_id")),
            text (obj.at("path")),
            text (obj.at("name")),
            text (obj.at("hash")),
        };
    }

    static int             id  (qd_json::value const&v) { return boost::get<double>(v); };
    static std::string     text(qd_json::value const&v) { return boost::get<qd_json::text>(v); };
    static qd_json::array  arr (qd_json::value const&v) { return boost::get<qd_json::array>(v); };
    static qd_json::object obj (qd_json::value const&v) { return boost::get<qd_json::object>(v); };
};

int main()
{
    auto json = qd_json::parse(sample);

    // compare json documents
    qd_json::value clone = json;
    assert(json == clone);

    // extract into user friendly datastructure from the question:
    auto extracted = Data::extract_from(json);

    for (auto& e : extracted.folders) std::cout << "folder:\t" << e.id << "\t" << e.path << "\n";
    for (auto& e : extracted.files)   std::cout << "file:\t"   << e.id << "\t" << e.path << "\t" << e.name << "\n";
}

Output:

folder: 109 /1/105/110/
folder: 110 /1/105/
file:   26  /1/105/ picture.png
file:   25  /1/105/110/ another_picture.jpg

Thank you for your answer, but sorry it's too complicated for me to use it and I didn't succeed in compiling it, and finally what I have decided to do is to use the result of the "print" function I posted on top, and try to grab the line "path" of each file, then replacing the IDs (number) by the name of the folders... it's more easy than putting the result in a "very complex" structure of data and thinking of the algorithm. — Aminos, Jan 03 '15 at 12:45
Thinking is always the hard part :) Anyways, you didn't specify much (no code, no algorithms) and I just don't recommend Boost Property Tree for "JSON". For fun: [here's how **"very complex"** it would be](http://paste.ubuntu.com/9664964/) to print that text using my data structure: [see it **Live On Coliru**](http://coliru.stacked-crooked.com/a/bce09803fc43fbfe) — sehe, Jan 03 '15 at 13:19
The [demonstration **was updated**](http://coliru.stacked-crooked.com/a/9dbd44c2e36302ef) to show you how to get to the _user-friendly_ data structure you suggested in the edit of the question, although I'd recommend [using `map` etc. so you could **lookup by e.g. `id`** or `name`](http://coliru.stacked-crooked.com/a/ab305f3b84a19d5e) instead. (Oh, as a bonus I implemented string (utf8) escape decoding). — sehe, Jan 03 '15 at 21:21
And with a [contrasting answer](http://stackoverflow.com/a/27760376/85371) that skips the "complex" generic JSON structure altogether, I think I've earned the [tag:ridiculously-comprehensive] badge once more :) /cc @MooingDuck — sehe, Jan 03 '15 at 22:58
@Aminos Here's the same [example JSON with ignorable content](http://paste.ubuntu.com/9671885/) (from the other answer comments) being parsed without any adjustments (except ignoring errors in `extract_from`) **[Live On Coliru](http://coliru.stacked-crooked.com/a/5766a39aa47f7043)** — sehe, Jan 04 '15 at 17:08
_// ² as another bonus I threw in the missing `null` and `bool` types_ due to [this answer](http://stackoverflow.com/a/27799928/85371) — sehe, Jan 06 '15 at 13:52

score 0 · Answer 3 · answered Jan 02 '15 at 22:42

0

Use Boost's built-in json parser to property tree:

http://www.boost.org/doc/libs/1_57_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.json_parser

answered Jan 02 '15 at 22:42

Inverse

4,408
2
26
35

Reading JSON file with C++ and BOOST

3 Answers3

Updated Demo

Linked