0

I've done some wrong manipulation of a 100 json files. Not sure what happened, but most of my json files now have a random number of the last characters repeated (as per image below). Is there a way to clean a json file by deleting characters starting from the last one, until the json file has returned into a clean json format ?

enter image description here

enter image description here

enter image description here

LBedo
  • 141
  • 8
  • 1
    The short answer is `no`. It looks like you've overwritten part of the file with a shorter version, leaving the original. You are supposed to parse the file with the `json` library, make changes to the resulting object and rewrite the file as text. – quamrana Mar 30 '23 at 08:10
  • 1
    Please include your code and the output as text and not as images. – ewokx Mar 30 '23 at 08:11
  • @ewokx there is no code and no output to show unfortunately. That's what I'm actually looking for... – LBedo Mar 30 '23 at 08:13

1 Answers1

1

You can use regular expressions. An alternative would be string manipulation, but in this case regex is quicker to write, especially for one-time-use code.

import re

files = ['a.json','b.json',...] # populate as needed

for filename in files:
    with open(filename,'r') as file:
        content = file.read()
    
    new_content = re.match('([\s\S]+\}\]\}\})[\s\S]+?',content).group(1)
    
    with open(filename,'w') as file:
        file.write(new_content)

This regex has several parts. [\s\S] matches all characters (whereas . would not match newlines and some other characters). The greedy [\s\S]+ matches as much as possible, and the lazy [\s\S]+? matches as little as possible (in this case, the trailing text we don't want).

We then parenthesise the part we do want to keep, ([\s\S]+\}\]\}\}), and extract that using .group(1) and write this to the file.

For more information, see Reference - What does this regex mean?, and in future I would suggest manipulating JSON using the builtin json library.

Mous
  • 953
  • 3
  • 14