1

I am using Python to convert a .dat file (which you can find here) to csv in order for me to use it later in numpy or csv reader.

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./i2019.dat").readlines()]

# write it as a new CSV file
with open("./i2019.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

But this results in an error message of

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte

Any help would be appreciated!

Steven Oh
  • 382
  • 3
  • 14

1 Answers1

1

It seems like your dat file uses Shift JIS(Japanese) encoding. So you can pass shift_jis as the encoding argument to the open function.

datContent = [i.strip().split() for i in open("./i2019.dat", encoding='shift_jis').readlines()]
Abdul Niyas P M
  • 18,035
  • 2
  • 25
  • 46
  • Thank you so much for your reply! I tried your code but it outputs a different message: "UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 760-761: illegal encoding". Would this because maybe the file that I am working with has formatting issues? – Steven Oh Jan 12 '22 at 06:37
  • Thank you so much. That outputted somthing! Sorry but, it outputs these weird string characters that are not readable... for example: "㔸娠〰㘰′ぐ\u3130ぐ\u3130ぐ\u3130ぐ\u3130ぐ\u3130"... What do you suggest that I do? Thank you so much for your help. – Steven Oh Jan 12 '22 at 07:01
  • 1
    @StevenOh I think your file uses `shift_jis` encoding. I have updated my answer can you try again? – Abdul Niyas P M Jan 12 '22 at 07:46
  • Wow! That worked out perfectly. Thank you so much. Would it be possible for you to tell me how you were able to figure out the encoding format?? – Steven Oh Jan 12 '22 at 07:57
  • 1
    @StevenOh If you are using windows and have notepad ++ installed, [it can autodetect the encoding of the file. But note that it may not be accurate in all cases](https://stackoverflow.com/a/14247144/6699447) – Abdul Niyas P M Jan 12 '22 at 08:00