0

i am using scrapy and when i run my spider i got error
TypeError: Object of type 'bytes' is not JSON serializable
2019-09-19 11:22:09 [scrapy.utils.signal] ERROR: Error caught on signal handler: >

and when i tried using pdb and try to print the item['title']
but still found an error given below:

'ascii' codec can't encode character '\xa7' in position 227: ordinal not in range(128)

if anybody have idea please share it with me why this issue occurs

shahrukh ijaz
  • 117
  • 3
  • 10
  • 1
    Hi, this looks like an encoding issue, _i.e._ you have a bytes object that needs encoding into a string before being serialized to json, and incorporates non-ascii characters which made your print unsuccessful. You might want to have a look at [this SO post](https://stackoverflow.com/questions/6224052/what-is-the-difference-between-a-string-and-a-byte-string) for example. By the way, which version of Python are you using? – pandrey Sep 19 '19 at 11:29
  • i am using python3.6 – shahrukh ijaz Sep 19 '19 at 11:37
  • when i do encoding it shows str object have no attribute decode but when i do encode().decode('UTF-9') then it shows following error **UnicodeEncodeError: 'ascii' codec can't encode character '\xa7' in position 143: ordinal not in range(128)** – shahrukh ijaz Sep 19 '19 at 11:39
  • 1
    It looks like `'ascii'` is being used as default encoding, which should not be the case. Maybe try explicitly encoding in 'utf-8' : `.encode('utf-8').decode('utf-8')`? Otherwise, you could use a regex to filter out non-ascii characters, but this is not necessarily great. – pandrey Sep 19 '19 at 11:49
  • not works showing this error **UnicodeEncodeError: 'ascii' codec can't encode character '\xa7' in position 134: ordinal not in range(128)** – shahrukh ijaz Sep 19 '19 at 11:51
  • 1
    Hmm... Could you be a little bit more specific as to what the object you are manipulating is, and which exact line of code triggers this exception? – pandrey Sep 19 '19 at 11:57
  • i have done using regex thanks a lot – shahrukh ijaz Sep 19 '19 at 12:00

2 Answers2

0
text = re.sub(r'[^\x00-\x7F]',' ', text)

using this i have done with this issue

shahrukh ijaz
  • 117
  • 3
  • 10
0

Character '\xa7' is the "section" symbol which is often found in legal text (https://en.wikipedia.org/wiki/Section_sign)

This character is not in the ascii character set. (See https://www.compart.com/en/unicode/U+00A7 and https://www.compart.com/en/unicode/charsets/containing/U+00A7)

Attempting to encode it as ascii is throwing the error.

As others have suggested in the comments, try encoding it with utf-8.

Dan King
  • 564
  • 10
  • 18