0

Currently have data in a .csv file and trying to move it to a temp.txt file. When I transfer the data, each line begins with b' and ends with \n' which I want to remove.

Previously got it working however had issues with utf-8 language in that I'd get error: UnicodeEncodeError: 'charmap' codec can't encode character '\u0336' in position 113: character maps to undefined

def data(file):
    for i in range(1000):
        print(file.readline().encode("utf-8"))

file = open(sys.argv[1], encoding = "utf-8")
data(file)

Currently getting these sort of results: b'Datahere\n'

And I would prefer just getting: Datahere

Ned_Kelly
  • 69
  • 6
  • 1
    If you don't want bytes, why are you calling `.encode()` on your strings? – Aran-Fey Apr 02 '19 at 08:20
  • Possible duplicate of [Reading a file without newlines](https://stackoverflow.com/questions/12330522/reading-a-file-without-newlines) – Aran-Fey Apr 02 '19 at 08:20
  • If I don't use .encode() on the end after the readline, I get: UnicodeEncodeError: 'charmap' codec can't encode character '\u0336' in position 113: character maps to Whereas if I add it, I don't have any errors but unfortunately get the 'b and \n' – Ned_Kelly Apr 02 '19 at 08:51
  • You are double encoding the data, producing [mojibake](https://en.wikipedia.org/wiki/Mojibake). The error suggests that `charmap` is being used to *write* this data somewhere, but you are not showing us that code, or the pertinent configuration. Probably your system encoding is not Unicode-compatible; a common fix is to configure Python explicitly to use `PYTHONIOENCODING=utf-8` or simply get rid of Windows for a fresh start in life. – tripleee Apr 02 '19 at 09:39
  • For what it's worth, CSV files *are* text files, by definition. – tripleee Apr 02 '19 at 09:39

1 Answers1

0

It is a bit of a hack, but you can just index into each line that you read by [1:-2]. This will get rid of the first character on each line 'b', and also the last two characters on each line '\n'.

NAP_time
  • 181
  • 9