Opening .dat file UnicodeDecodeError: invalid start byte

Question

I am using Python to convert a .dat file (which you can find here) to csv in order for me to use it later in numpy or csv reader.

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./i2019.dat").readlines()]

# write it as a new CSV file
with open("./i2019.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

But this results in an error message of

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte

Any help would be appreciated!

Abdul Niyas P M · Accepted Answer · 2022-01-12T07:46:11.513

1

It seems like your dat file uses Shift JIS(Japanese) encoding. So you can pass shift_jis as the encoding argument to the open function.

datContent = [i.strip().split() for i in open("./i2019.dat", encoding='shift_jis').readlines()]

edited Jan 12 '22 at 07:46

answered Jan 12 '22 at 06:32

Abdul Niyas P M

18,035
2
25
46

Thank you so much for your reply! I tried your code but it outputs a different message: "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 760-761: illegal encoding". Would this because maybe the file that I am working with has formatting issues? – Steven Oh Jan 12 '22 at 06:37
Thank you so much. That outputted somthing! Sorry but, it outputs these weird string characters that are not readable... for example: "㔸娠〰㘰′ぐ\u3130ぐ\u3130ぐ\u3130ぐ\u3130ぐ\u3130"... What do you suggest that I do? Thank you so much for your help. – Steven Oh Jan 12 '22 at 07:01
1

@StevenOh I think your file uses `shift_jis` encoding. I have updated my answer can you try again? – Abdul Niyas P M Jan 12 '22 at 07:46
Wow! That worked out perfectly. Thank you so much. Would it be possible for you to tell me how you were able to figure out the encoding format?? – Steven Oh Jan 12 '22 at 07:57
1

@StevenOh If you are using windows and have notepad ++ installed, [it can autodetect the encoding of the file. But note that it may not be accurate in all cases](https://stackoverflow.com/a/14247144/6699447) – Abdul Niyas P M Jan 12 '22 at 08:00

Opening .dat file UnicodeDecodeError: invalid start byte

1 Answers1