1

I have captured a large amount of data from numerous CSV files. Certain information has been carved out. One section I have carved out is a section that has a large amount of various text formats. Some of these are emotions and other non standard text.

When outputting this data into a HTML format I have had errors. Currently I have the following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0X90 in Position: character maps to <undefined>.

The program currently stores information into an Array from a String. The Array is then written to a HTML file.

Any idea how to overcome this issue in Python 3.2 or how to implement a Character buffer?

UPDATE

I have tried the comments below and also done more research.

I have used this code to no avail:

MessageArray.append(Message.encode('ascii', 'ignore'))

But I got the error: TypeError: Cant convert 'bytes' object to str implicitly.

Zeki Turedi
  • 51
  • 1
  • 4
  • What encoding is your input data in? CSV are text files, they shouldn't contain "raw" binary data. You probably just need to read the CSV files with the correct charset. – millimoose Jul 26 '12 at 14:05
  • 3
    "Code snippets can be supplied if needed." – Yes, please. – Sven Marnach Jul 26 '12 at 14:05
  • There isn't really a "raw" way to encode/decode between an internal representation of unicode strings and binary output. (Except maybe UTF-32 but that's very rarely useful.) You either go between byte arrays and byte arrays (which is not the case here because both CSV and HTML are text formats), or you have to know what encoding your input and output text is in. – millimoose Jul 26 '12 at 14:10
  • I am reading the data using wb. This is due to needing it in as a binary to carve certain parts of the data. – Zeki Turedi Jul 26 '12 at 14:23
  • Please take a look at this issue [http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file][1] [1]: http://stackoverflow.com/questions/4545661/unicodedecodeerror-when-redirecting-to-file – Marcin Zaluski Jul 26 '12 at 14:56
  • @MarcinZaluski I have had a look at sys.stdout previously without any luck. I am still quite relatively new to Python so most likely going wrong somewhere. I tried the following code: `Message = sys.stdout.buffer.write(Message_split.encode('utf-8'))` – Zeki Turedi Jul 27 '12 at 06:49
  • "TypeError: Cant convert 'bytes' object to str implicitly." is just Python's way of nudging you to wrap whatever you're doing in str(...). – SilverbackNet Aug 01 '12 at 09:03

1 Answers1

0

I was able to fix my issue by following @SilverbackNet 's comment. Although this did not fix my overall issue as being able to import and convert raw binary data from a CSV but allowed to ignore the data that was bringing me issues.

Zeki Turedi
  • 51
  • 1
  • 4