How to remove comment lines from a JSON file in python

Question

I am getting a JSON file with following format :

// 20170407
// http://info.employeeportal.org

{
 "EmployeeDataList": [
{
 "EmployeeCode": "200005ABH9",
 "Skill": CT70,
 "Sales": 0.0,
 "LostSales": 1010.4
} 
 ]
}

Need to remove the extra comment lines present in the file.

I tried with the following code :

import json
import commentjson

with open('EmployeeDataList.json') as json_data:
            employee_data = json.load(json_data)
            '''employee_data = json.dump(json.load(json_data))'''
            '''employee_data = commentjson.load(json_data)'''
            print(employee_data)`

Still not able to remove the comments from the file and bring the JSON file in correct format.

Not getting where things are going wrong? Any direction in this regard is highly appreciated.Thanks in advance

`//` comments are not allowed in JSON. So what you have is not valid JSON. You will have to remove the comments before parsing. — Klaus D., Apr 09 '17 at 03:58
@Klaus D...This is a way JSON file is generated. Is there a way to remove the comment lines from the file and bring it into correct format? While searching on internet I also came across JSON5 but able to get how to use it? — Eupheus, Apr 09 '17 at 04:19
@user4569636: Your file can't easily be turned into valid JSON. It contains not only comments, but references to variables: `"Skill": CT70`. — Blender, Apr 09 '17 at 04:25
What you have is probably JSON5, which contains comments and variables, as @Blender mentions. This appears to be only parsable by Javascript, not Python. https://github.com/json5/json5 — OneCricketeer, Apr 09 '17 at 04:37
@cricket_007: I don't think JSON5 has variables. It's still parseable with Python, you just have to decide what to do with the variables. — Blender, Apr 09 '17 at 04:45
@Blender Could be HOCON, then. https://github.com/typesafehub/config#using-hocon-the-json-superset — OneCricketeer, Apr 09 '17 at 04:47

Blender · Answer 1 · 2017-04-09T04:22:16.873

6

You're not using commentjson correctly. It has the same interface as the json module:

import commentjson

with open('EmployeeDataList.json', 'r') as handle:
    employee_data = commentjson.load(handle)

print(employee_data)

Although in this case, your comments are simple enough that you probably don't need to install an extra module to remove them:

import json

with open('EmployeeDataList.json', 'r') as handle:
    fixed_json = ''.join(line for line in handle if not line.startswith('//'))
    employee_data = json.loads(fixed_json)

print(employee_data)

Note the difference here between the two code snippets is that json.loads is used instead of json.load, since you're parsing a string instead of a file object.

edited Apr 09 '17 at 04:22

answered Apr 09 '17 at 04:16

Blender

289,723
53
439
496

@user4569636 How so? His second proposed solution should fix the example you posted. – spicypumpkin Apr 09 '17 at 04:33
2

@Posh_Pumpkin: The "JSON" contains variables (`"Skill": CT70`), not just comments. – Blender Apr 09 '17 at 04:35
@Posh_Pumpkin See my answer – OneCricketeer Apr 09 '17 at 04:53
@Blender..Thanks. The second solution does solves it but there is an additional "u'EmployeeCode": u'200005ABH9" added and the JSON file sequence go changed. – Eupheus Apr 09 '17 at 05:04
@user4569636: The `u` prefix just denotes a unicode string, it's not a problem. As for the order, objects in JavaScript and dictionaries in Python are both unordered. You can use have `json.loads` use Python's `OrderedDict` if the order matters: https://stackoverflow.com/questions/6921699/can-i-get-json-to-load-into-an-ordereddict-in-python – Blender Apr 09 '17 at 06:20

score 1 · Answer 2 · edited Apr 09 '17 at 05:01

1

Try JSON-minify:

JSON-minify minifies blocks of JSON-like content into valid JSON by removing all whitespace and JS-style comments (single-line // and multiline /* .. */).

edited Apr 09 '17 at 05:01

xlm

6,854
14
53
55

answered Apr 09 '17 at 04:06

hailinzeng

966
9
24

Tomas Ruiz · Answer 3 · 2021-03-09T10:12:41.167

1

I usually read the JSON as a normal file, delete the comments and then parse it as a JSON string. It can be done in one line with the following snippet:

with open(path,'r') as f: jsonDict = json.loads('\n'.join(row for row in f if not row.lstrip().startswith("//")))

IMHO it is very convenient because it does not need CommentJSON or any other non standard library.

edited Mar 09 '21 at 10:12

answered Feb 10 '18 at 13:15

Tomas Ruiz

139
4
15

1

Interesting answer but it will fail on lines containing `//` other than at the start e.g. `{"foo": "http://bar.com"}`. Better to write `if not row.lstrip().startswith("//")`. Also, if you remove `.readlines()` and the square brackets then the lines will be iterated lazily (but still joined before passing to `loads`) which will be slightly more efficient. – Jim Oldfield Mar 08 '21 at 09:57

score 0 · Answer 4 · answered Apr 09 '17 at 04:05

0

Well that's not a valid json format so just open it like you would a text document then delete anything from// to \n.

with open("EmployeeDataList.json", "r") as rf:
    with open("output.json", "w") as wf:
        for line in rf.readlines():
            if line[0:2] == "//"
                continue
            wf.write(line)

answered Apr 09 '17 at 04:05

spicypumpkin

1,209
2
10
21

You can have two `open` on the same line – OneCricketeer Apr 09 '17 at 04:26
@cricket_007 huh? Where? – spicypumpkin Apr 09 '17 at 04:31
I'm just saying the second `open() as wf` can be moved to the first line – OneCricketeer Apr 09 '17 at 04:34
@cricket_007 Oh wait sorry I totally misread your comment, mb – spicypumpkin Apr 09 '17 at 04:35

score 0 · Answer 5 · answered Apr 09 '17 at 04:53

Your file is parsable using HOCON.

pip install pyhocon

>>> from pyhocon import ConfigFactory
>>> conf = ConfigFactory.parse_file('data.txt')
>>> conf
ConfigTree([('EmployeeDataList',
             [ConfigTree([('EmployeeCode', '200005ABH9'),
                          ('Skill', 'CT70'),
                          ('Sales', 0.0),
                          ('LostSales', 1010.4)])])])

score -1 · Answer 6 · answered Apr 09 '17 at 04:04

-1

If it is the same number of lines every time you can just do:

fh = open('EmployeeDataList.NOTjson',"r")
rawText = fh.read()
json_data = rawText[rawText.index("\n",3)+1:]

This way json_data is now the string of text without the first 3 lines.

answered Apr 09 '17 at 04:04

kpie

9,588
5
28
50

How to remove comment lines from a JSON file in python

6 Answers6