0

I tried so many suggestions from all over the internet but I am not able to make it work. I can get everything printed correctly in console, but if I try to store same thing in JSON or .txt it will print UTF-8.

test.html

<p class="'verse"> अनि यस्तो हुन गयो कि उहाँ उजाड स्थानतिर प्रस्थान गर्नुभयो।</p>

test.py

import json
from bs4 import BeautifulSoup

page = (r'C:\Users\Rochak\Desktop\Beautiful_Soup\test.html')
page = open(page, encoding="utf8")
soup = BeautifulSoup(page.read(), "html.parser")
data = (soup.find('p').text)
print(data)

with open('test.json' , 'w') as outfile:
    json.dump(data, outfile, sort_keys=True, indent= 4)

test.json

" \u0905\u0928\u093f \u092f\u0938\u094d\u0924\u094b \u0939\u0941\u0928........"

console

अनि यस्तो हुन गयो कि उहाँ उजाड स्थानतिर प्रस्थान गर्नुभयो।
Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68

1 Answers1

0

I believe that you cannot write those symbols because of the encoding, actually if you open the JSON you created I get:

 अनि यस्तो हुन गयो कि उहाँ उजाड स्थानतिर प्रस्थान गर्नुभयो।

So, you're saving those symbols but that's the way your SO can open it.

I used this to read it:

with open('test.json') as json_file:
    data = json.load(json_file)
    print(data)