how to write Chinese text to a file in python

Question

I am having trouble while extracting Chinese text and writing it into a file.

str = "全球紧张致富豪财富缩水 贝索斯丁磊分列跌幅前两位";
f=open('test.txt','w');
f.write(str);

above code runs fine. while writing to file in below code showing gibberish.

import requests;
from bs4 import BeautifulSoup

f=open('data.txt','w');

def techSinaCrawler():
    url="http://tech.sina.com.cn/"
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    for li in soup.findAll('li',{'data-sudaclick': 'yaowenlist-1'}):
        for link in li.findAll('a'):
            href = link.get('href')
            techSinaInsideLinkCrawler(href);            

def techSinaInsideLinkCrawler(url):

    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    for data in soup.findAll('h1',{'id': 'main_title'}):
        str='main_title'+':'+ data.string
        f.write(str);
        f.write('\n');

techSinaCrawler();

Thanks for the help

[This](https://stackoverflow.com/questions/20205455/how-to-correctly-parse-utf-8-encoded-html-to-unicode-strings-with-beautifulsoup) and [this](https://stackoverflow.com/questions/7219361/python-and-beautifulsoup-encoding-issues) might be useful dealing with BeautifulSoup encoding issues. — Ramon, Aug 11 '17 at 19:19

score 0 · Answer 1 · answered Aug 11 '17 at 17:24

In Python 2, it's a good idea to use codecs.open() if you're dealing with encodings other than ASCII. That way, you don't need to manually encode everything you write. Also, os.walk() should be passed a Unicode string if you're expecting non-ASCII characters in the filenames:

import codecs
with codecs.open("c:/Users/me/filename.txt", "a", encoding="utf-8") as d:
   for dir, subdirs, files in os.walk(u"c:/temp"):
      for f in files:
         fname = os.path.join(dir, f)
         print fname
         d.write(fname + "\n")

No need to call d.close(), the with block already takes care of that.

score 0 · Accepted Answer · answered Aug 11 '17 at 19:13

0

Solved..

Just changed the .text to .content

plain_text = source_code.text to plain_text = source_code.content

to get the output as Chinese text.

Got the desired result

answered Aug 11 '17 at 19:13

Zain Danish

9
6

how to write Chinese text to a file in python

2 Answers2