I am processing large text files using Python. Each line of a file is a complete JSON message, and might be very long. I need to insert information about each line into a database. This info is very simple: the length of the line plus a unique ID which each message contains. So each line has the form
{"field1":"val1", ..., "ID":"12345", ..., "fieldK":"valK"}
and I need to extract "12345" from the message.
Right now I load the entire string using json.loads()
then find the ID and ignore the rest.
My code is too slow and I need to speed it up. I am trying to see if there is a way of extracting "ID" faster than loading like the whole string. One option is to search the string for "ID" and then process :"12345"
. But it might be brittle if it so happens that there is a substring "ID" someplace else in the message.
So is there a way of somehow partially loading the line to find ID, which would be as robust as, but also faster than, loading the whole line?