Decoding bytes CSV results in string with 'b' prefix

Question

I'm working with a csv I'm fetching online with requests.get, so for context this is how the file is being uploaded:

import pandas as pd
import requests

comments = []
body = requests.get()
for comment in body:
    comments.append([
                str(body['data']['body']).encode(encoding='utf-8')
            ])
df = pd.DataFrame(comments)[0]
requests.put('http://sample/desination.csv', data=df.to_csv(index=False))

The encoding when appending to comments is required when using requests because it defaulted to latin-1 and requests is expecting utf-8.

The resulting csv contains 1 column with rows like: b'Presicely'

Makes sense, encoding to utf-8 converted the string to bytes type.

Now where I'm later trying to decode the csv I have the following:

import requests

data = requests.get('http://destination.csv').content
testdata = data.decode('utf-8').splitlines()
print(testdata[2])

b'Presicely'

If I don't decode:

print(data[1:20])

b'Presicely'\r\n

I was under the impression that decoding data would eliminate the b prefixes, as most stackoverflow answers suggest. The problem could be with how I initially upload the csv, so I've tried tackling that a few different ways with no luck (can't get around encoding it).

Any suggestions?

P.S. python version 3.7.7

Edit: I ended up having no luck trying to get this to work. DataFrame.to_csv() returns a string and as lenz pointed out the conversion to string type is likely the culprit of the issue.

Ultimately I ended up saving the data as a .txt to eliminate the need to call to_csv(), which led to my decode to work as expected confirming our suspicion. The txt file format works for me so I'm keeping it that way.

Probably there's an (implicit) `str` call somewhere, so the values really are `"b'Precisely'"` and `"b'Precisely'\r\n"`. — lenz, Jul 30 '20 at 07:10
By serialising a list of bytes objects (rather than first serialising, then encoding the whole dump), you probably need to also decode each cell individually too. — lenz, Jul 30 '20 at 07:11
@snakecharmerb just tried doing this both with/without decoding the body but the results were the same. — AlwaysLearning, Jul 30 '20 at 12:19
@lenz you're right in that to_csv returns a str object, so that may be where the problem lies. However when I try to decode the entire body as such: datadf = pd.read_csv(io.StringIO(data.decode('utf-8'))) I can then fetch a cell: testdata = datadf.iloc[1,0] but then that cell is already a string which can't be further decoded. Are you suggesting I convert it to another type to decode it further, on each row? — AlwaysLearning, Jul 30 '20 at 12:28
I'm not sure what to do. But once you call `str()` on a bytes object without an `encoding=` parameter, you get to a representation like `"b'...'"`, which is not easily reverted, so you need to find a way to avoid this. Encoding individual cells doesn't seem promising to me. — lenz, Jul 30 '20 at 14:53
Possibly relevant https://stackoverflow.com/a/55898249/5320906 — snakecharmerb, Aug 01 '20 at 08:55

score 1 · Answer 1 · answered Aug 04 '20 at 22:47

I was able to get this to work, credit to my irl friend who rubber ducked me through the solution. It was quite simple, what I needed to do was encode the resulting string from to_csv function like so:

comments = []
body = requests.get()
for comment in body:
    comments.append([
            str(body['data']['body'])
        ])
df = pd.DataFrame(comments)[0]
csv_data = df.to_csv(index=False)
csv_data = csv_data.encode('utf-8')
requests.put('http://sample/desination.csv', data=csv_data)

I'm sure you can compress the above code by combining encode to either to_csv function as a flag or applying it to the result.

The resulting file uploaded can now be decoded properly and you can keep your csv format.

Decoding bytes CSV results in string with 'b' prefix

1 Answers1