1

I am trying to load JSON file data into a dataframe, filter a few records, and write it back to file again. My file contains one JSON record per line and each one has a URL in it. This is the sample data in the input file.

{"site_code":"111","site_url":"https://www.site111.com"}
{"site_code":"222","site_url":"https://www.site333.com"}
{"site_code":"333","site_url":"https://www.site333.com"}

Sample code I used

import pandas as pd
sites = pd.read_json('sites.json', lines=True)
modified_sites = sites[sites['site_code']!=222]
modified_sites.to_json('modified_sites.json',orient='records',lines=True)

But the generated file contains escaped forward slashes

{"site_code":111,"site_url":"https:\/\/www.site111.com"}
{"site_code":333,"site_url":"https:\/\/www.site333.com"}

How can I avoid it and get the following data in the generated file?

{"site_code":111,"site_url":"https://www.site111.com"}
{"site_code":333,"site_url":"https://www.site333.com"}

Note: I referred to these but not helpful for my case

  1. pandas to_json() redundant backslashes
Raghu Molabanti
  • 317
  • 5
  • 16
  • This is not a solution to your problem, but at least try to explain why forward slashes may be escaped in json. https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped – Gealber Jul 21 '21 at 14:06

1 Answers1

2

You can try to format escaped slashes directly and save result to file:

import pandas as pd
import numpy as np

sites = pd.read_json('sites.json', lines=True)
modified_sites = sites[sites['site_code']!=222]
modified_sites.to_json('modified_sites.json',orient='records',lines=True)
formatted_json = modified_sites.to_json(orient='records',lines=True).replace('\\/', '/')
print(formatted_json, file=open('modified_sites.json', 'w'))
okrn
  • 54
  • 2