2

C/C++ is well known for being in many cases faster than python. I made a test in this direction.

I have a large (beautified) JSON file with 2200 lines. The test consisted in reading the file, deserializing the data in memory (I used dictionaries as data structure) and displaying the content.

I performed the test both in python using the built-in json library and in C++ using the external nlohmann JSON library.

After a few runs, I had the shock to see that C++ takes 0.01 seconds and Python 3 takes about 0.001 seconds, which is almost 10 times faster!

I searched in the docs but I did not find information about what was used in writing the json library.

C++:

#include <iostream>
#include <string.h>
#include <boost/property_tree/json_parser.hpp>
#include <boost/property_tree/ptree.hpp>
#include "nlohmann/json.hpp"
using namespace std;
using json = nlohmann::json;
namespace pt = boost::property_tree;
#include <ctime>

int main()
{

    ifstream input;
    input.open("input.json");

    json json_data;

    input >> json_data; 

    cout << json_data << endl;

  return 0;
}

And Python:

import json
from time import time

t1 = time()
with open('output.json','r+') as f:
    f = json.load(f)

    print(f)
t2 = time()
elapsed = t2 - t1

print('elapsed time: '+str(elapsed))

Final question, is the json Python library by any chance written in any low level language and this is the main reason for performance, or is just pure Python?

AMC
  • 2,642
  • 7
  • 13
  • 35
Alex M.M.
  • 501
  • 1
  • 7
  • 18
  • 5
    Which Python interpreter? You can see the CPython version here: https://github.com/python/cpython/blob/master/Modules/_json.c – jonrsharpe Apr 05 '20 at 18:18
  • 1
    [Take a look](https://github.com/python/cpython/blob/master/Modules/_json.c). – Olvin Roght Apr 05 '20 at 18:19
  • 1
    Have a look at here: https://github.com/python/cpython/blob/3.8/Lib/json/__init__.py – Azrion Apr 05 '20 at 18:20
  • 1
    @Azrion Do note that your link is the high-level _wrapper_ for CPython's `json` implementation. That module ultimately delegate to `_json`, which is the C module linked above. – Brian61354270 Apr 05 '20 at 18:22
  • 4
    "After a few runs, I had the shock to see that C++ takes 0.01 seconds and python 3 takes about 0.001 seconds, which is almost 10 times faster!" - I'll bet that you timed a unoptimized debug build and not a optimized release build. – Jesper Juhl Apr 05 '20 at 18:22
  • 2
    did you turn on optimizations for the c++ code? – 463035818_is_not_an_ai Apr 05 '20 at 18:24
  • 2
    Note: The C++ iostream library is notoriously slow. If you want extreme speed, that's not what you want to use, you want to use lower level functions directly in that case (you'll gain speed but give up on readability and simplicity - it's always a tradeoff). – Jesper Juhl Apr 05 '20 at 18:37
  • 2
    If you are timing it by reading files and also timing it by printing out the result (based on what I'm seeing in your python code).. you're doing it wrong and your benchmark is severely flawed. You're not just timing serialization/deserialization. You're timing I/O as well.. and in all cases I've come across, C++ I/O is buffered. In your python code, you're timing how long `fopen` + `print` + `json.loads` + the implicit/hidden `fclose` takes. – Brandon Apr 05 '20 at 18:52
  • @Brandon (micro) benchmarking is hard ;) – Jesper Juhl Apr 05 '20 at 18:55
  • I don't understand the point of these micro-benchmarks. – AMC Apr 05 '20 at 20:47
  • @JesperJuhl: Note: [Much of that slowness is due to being tied to C's `stdio` buffering; it can be desynchronized to speed things up](https://stackoverflow.com/q/9371238/364696). – ShadowRanger Apr 05 '20 at 21:52
  • @Brandon: Pedantic note: In Python 3, `fopen`/`fclose` aren't used; it's implemented in terms of the OS native primitives (e.g. `open`/`close`, or `CreateFile`/`CloseHandle`), with buffering implemented in the Python interpreter itself. `fopen`/`fclose` was how it worked in Python 2, but not anymore. – ShadowRanger Apr 05 '20 at 21:55

2 Answers2

5

a poorly written library, no matter what language it was written, can give you abyssal speed.

there are a few specialized and highly optimized JSON parser in C++, including rapidjson and simdjson, see this recent comparison:

https://lemire.me/blog/2020/03/31/we-released-simdjson-0-3-the-fastest-json-parser-in-the-world-is-even-better/

FangQ
  • 1,444
  • 10
  • 18
2

C/C++ is well known for being in many cases faster than python.

Not in many cases, always.

Of course, if your C/C++ code is badly written, it can be as slow as you want.

I performed the test both in python using the built-in json library and in C++ using the external nlohmann JSON library.

The nlohmann JSON library is slower than other alternatives. It is definitely possible that it is slower than CPython's implementation. Use another library if you need speed.

Having said that, please note that benchmarking is hard. It may be the case that, as @Jesper and @idclev mention, you are simply missing optimizations when compiling the C++ code.

is the json library by any chance written in any low level language and this is the main reason for performance, or is just pure python?

Yes, the CPython implementation is written in C as @jonrsharpe pointed out.

Acorn
  • 24,970
  • 5
  • 40
  • 69
  • 1
    It's technically correct that C/C++ can always run faster. But from a user's perspective more often both are fast enough, and sometimes Python is faster because it's more natural to use an optimized library or data structure. – maxy Apr 05 '20 at 18:45
  • @maxy "*from a user's perspective more often both are fast enough*" That depends on what your field is. In any case, it is clear that OP cares about performance even on the millisecond scale. – Acorn Jan 15 '21 at 07:57
  • @maxy "*sometimes Python is faster because it's more natural to use an optimized library or data structure*" That's false. Writing fast low-level code (and that includes using the hundreds of libraries written for that purpose) is the raison d'etre of C/C++. Not to mention that using C/C++ libraries is as easy as using the fast Python ones (the C-based ones): both require compilation of native code. – Acorn Jan 15 '21 at 08:02