0

Currently, I have a python program which calls a bash script to parse through this log file that CONTAINS json data, but each line has non-json data on it as well. For example, every line has this format with a different JSON:

<123>1 2017-01-23T10:53:56.111-11:12 blaa blaa '{"jsondata": "1.0", "result": 1, "id": 1234}'

My goal is to count the number of times this message occurred, possibly the number of times another message occurred in the line after this, and make sure it's formatted correctly.

I have been using a bash script, grepping regular expressions that are formatted to the correct format. So the problem is that JSON fields may come in out of order, so my regex wouldn't work. For example the above line may come in as:

<123>1 2017-01-23T10:53:56.111-11:12 blaa blaa '{"jsondata": "1.0","id": 1234, "result": 1}'

I can also do it in python with json decoder but since this log file is not a true JSON file, I don't think that would work. What's the best but simplest way to do this? Preferably with python or some command line scripting. I'm in Ubuntu 16.04.

My expected input is a log file with lines that are the same as above. My expected output is to be able to check how many lines are formatted as above, with the same keys in any order, and different values, as well as check how many times that specific json message occurred (there are different json messages on each line), even if the JSON keys are not in the same order.

Celi Manu
  • 371
  • 2
  • 7
  • 19
  • 1
    have you tried anything? – depperm Feb 03 '17 at 16:58
  • You can use a regex to extract the JSON from the string. Example: r'(?P{.+?})' That regex should only match the JSON portion. – Alex Luis Arias Feb 03 '17 at 17:01
  • Can you give a verifiable input and an expected output. Your current information is unclear. – Inian Feb 03 '17 at 17:03
  • Yes I wrote that i tried grepping with a regex matching the JSON, but the JSON keys may come in out of order, so that won't work all of the time. I also wrote that i expect to be able to count the number of times a certain JSON message is there, as well as see if all the correct keys are in that json – Celi Manu Feb 03 '17 at 17:11
  • JSON fields are naturally unordered. You should extract the string within the single quotes. – OneCricketeer Feb 03 '17 at 17:12
  • 1
    A similar question, using `jq` to first parse out the JSON from the textual string and then interpreted it as JSON, is asked at [In bash, how can I parse multiple newline delimited json objects from a log file?](http://stackoverflow.com/questions/41904454/in-bash-how-can-i-parse-multiple-newline-delimited-json-objects-from-a-log-file) – Charles Duffy Feb 03 '17 at 17:21

2 Answers2

3

Here's an example of doing the parsing in python:

import re
import json

s = """<123>1 2017-01-23T10:53:56.111-11:12 blaa blaa '{"jsondata":"1.0","id": 1234, "result": 1}'"""

yourDict = json.loads(re.search('(\{.+\})', s)[0])

yourDict['id']
>> 1234
yourDict['result']
>>> 1
yourDict['jsondata']
>>> 1.0
Alex Luis Arias
  • 1,313
  • 1
  • 14
  • 27
  • Thank you. This answer should be marked as correct. – vdkotian Jun 28 '18 at 07:57
  • I have a similar problem, I need to find json where each key value is in single line. """<123>1 2017-01-23T10:53:56.111-11:12 blaa blaa \n {"key1": "value",\n "key2": "value2",\n "key3": "value3"} – vdkotian Jun 29 '18 at 14:30
1

Your log file is quoting the JSON data. Use that to your advantage to read a string, not pull apart the JSON using regex.

# coding=utf8

import re, json

regex = r"\<\d+\>\d \d+-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}-\d{2}:\d{2} (?:[\w ]+)'([^']+)'"

test_str = "<123>1 2017-01-23T10:53:56.111-11:12 blaa blaa '{\"jsondata\": \"1.0\",\"id\": 1234, \"result\": 1}'"

for match in re.findall(regex, test_str):
  j = json.loads(match)
  print(j['id'])
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • While I closed the question as a dupe, this is a useful answer -- inclined to move it over to the other Q? – Charles Duffy Feb 03 '17 at 17:23
  • Maybe, but other question doesn't have same log format. – OneCricketeer Feb 03 '17 at 17:29
  • The differences strike me as entirely trivial. If there were quoting and escaping going on that had to be undone, I could see them being usefully distinguished, but that doesn't strike me as the case here -- if we accepted these as two different, usefully distinct questions, we'd need to accept *every* distinct pattern of "how do I extract content from a line of text including some JSON?" as worthy of its own distinct question. – Charles Duffy Feb 03 '17 at 17:36