I'm working with a csv I'm fetching online with requests.get, so for context this is how the file is being uploaded:
import pandas as pd
import requests
comments = []
body = requests.get()
for comment in body:
comments.append([
str(body['data']['body']).encode(encoding='utf-8')
])
df = pd.DataFrame(comments)[0]
requests.put('http://sample/desination.csv', data=df.to_csv(index=False))
The encoding when appending to comments is required when using requests because it defaulted to latin-1 and requests is expecting utf-8.
The resulting csv contains 1 column with rows like: b'Presicely'
Makes sense, encoding to utf-8 converted the string to bytes type.
Now where I'm later trying to decode the csv I have the following:
import requests
data = requests.get('http://destination.csv').content
testdata = data.decode('utf-8').splitlines()
print(testdata[2])
b'Presicely'
If I don't decode:
print(data[1:20])
b'Presicely'\r\n
I was under the impression that decoding data would eliminate the b prefixes, as most stackoverflow answers suggest. The problem could be with how I initially upload the csv, so I've tried tackling that a few different ways with no luck (can't get around encoding it).
Any suggestions?
P.S. python version 3.7.7
Edit: I ended up having no luck trying to get this to work. DataFrame.to_csv() returns a string and as lenz pointed out the conversion to string type is likely the culprit of the issue.
Ultimately I ended up saving the data as a .txt to eliminate the need to call to_csv(), which led to my decode to work as expected confirming our suspicion. The txt file format works for me so I'm keeping it that way.