0
def save_csv_to_cloud_storage(df,file_name,folder='output'):

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/Users/******/Desktop/*****.json'
    storage_client = storage.Client()
    bucket = storage_client.get_bucket('fluxlengow')
    now = datetime.now()
    dt_string = now.strftime("%Y%m%d-%H%M%S")
    f = StringIO()
    df.to_csv(f,sep=',')
    f.seek(0)
    Blob('{}/{}_{}_.csv'.format(folder, dt_string, file_name),bucket).upload_from_file(f,content_type='text/csv')


def lengowToStorage():

    liste = ['https://httpnas.****.*****/******/SUP****/*******_FR.csv','https://httpnas.****.*****/******/SUP****/*******_UK.csv','https://httpnas.****.*****/******/SUP****/*******_IT.csv']
    for i in liste :
        name = i.split('/')[-1]
        name = name.split('.')[0]
        CSV_URL = '{}'.format(i)
        with requests.Session() as s:
            download = s.get(CSV_URL)
            decoded_content = download.content.decode('utf8')
            cr = csv.reader(decoded_content.splitlines(), delimiter='|')
            my_list = list(cr)
            df_ = pd.DataFrame(my_list, columns=my_list[0]).drop(0)
            save_csv_to_cloud_storage(df_,file_name=name,folder='input')
            print("recuperation du fichier : {}".format(i))

lengowToStorage()

Hi guys , I'm sorry but I need your help because I'm really stuck on this encoding issue. I'm trying to send my dataframe to cloud storage as a CSV file. unfortunately when i try to save it to storage i get this error

'latin-1' codec can't encode character '\u2019' in position 32318: Body ('’') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

I then did the "utf-8" encoding on each columns (not shown in the code) and the saving to Cloud Storage does work but my data are into this format :

b'Antaeus Lotion Apr\xc3\xa8s Rasage 100ml'

I decode UTF8, I encode UTF8 ... but can't get my data into the string version I want...

'Antaeus Lotion Après Rasage 100ml'

If you could help , Id be very greatful

  • 1
    Do you get the UnicodeError for the `print()` statement, by any chance? If so, you can probably omit that line, or wrap its argument in `ascii()`, as it seems to be a progress indicator only. – lenz Mar 05 '20 at 11:03
  • 1
    If not, please include the full traceback in your post. – lenz Mar 05 '20 at 11:03
  • I found this closed issue on Github https://github.com/psf/requests/issues/1822 They found a solution for a similar Error message.Have you tried it? If this does not work for you, please share your full Traceback as @lenz asked. It would very useful – Kevin Quinzel Mar 06 '20 at 00:42
  • I also found this old StackOverflow post https://stackoverflow.com/questions/51157481/unicode-encode-error-latin-1-codec-cant-encode-character-u2019?rq=1 – Kevin Quinzel Mar 06 '20 at 00:56

1 Answers1

1

I strongly suggest to use the gcsfs package which allows to write to the bucket directly from pandas, given the URL

def store_dataframe(df, filename,  path = "news"):
    url = f"gs://{BUCKET_NAME}/{path}/{filename}"
    df.to_csv(url)