2

I am trying to write a pandas dataframe as CSV to Bluemix Object Storage from a DSX Python notebook. I first save the dataframe to a 'local' CSV file. I then have a routine that attempts to write the file to Object Storage. I get a 413 response - object too large. The file is only about 3MB. Here's my code, based on a JSON example I found here: http://datascience.ibm.com/blog/working-with-object-storage-in-data-science-experience-python-edition/

import requests

def put_file(credentials, local_file_name):  
    """This function writes file content to Object Storage V3 """
    url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
        'password': {'user': {'name': credentials['name'],'domain': {'id': credentials['domain']},
        'password': credentials['password']}}}}}
    headers = {'Content-Type': 'text/csv'}
    with open(local_file_name, 'rb') as f:
        resp1 = requests.post(url=url1, data=f, headers=headers)
    return resp1  

Any help or pointers is much appreciated.

ralphearle
  • 1,696
  • 13
  • 18
Ted Morris
  • 21
  • 2

1 Answers1

5

This code snippet from the tutorial worked fine for me (for a 12 MB file).

from io import BytesIO  
import requests  
import json  
import pandas as pd

def put_file(credentials, local_file_name):  
    """This functions returns a StringIO object containing
    the file content from Bluemix Object Storage V3."""
    f = open(local_file_name,'r')
    my_data = f.read()
    url1 = ''.join(['https://identity.open.softlayer.com', '/v3/auth/tokens'])
    data = {'auth': {'identity': {'methods': ['password'],
            'password': {'user': {'name': credentials['username'],'domain': {'id': credentials['domain_id']},
            'password': credentials['password']}}}}}
    headers1 = {'Content-Type': 'application/csv'}
    resp1 = requests.post(url=url1, data=json.dumps(data), headers=headers1)
    resp1_body = resp1.json()
    for e1 in resp1_body['token']['catalog']:
        if(e1['type']=='object-store'):
            for e2 in e1['endpoints']:
                        if(e2['interface']=='public'and e2['region']=='dallas'):
                            url2 = ''.join([e2['url'],'/', credentials['container'], '/', local_file_name])
    s_subject_token = resp1.headers['x-subject-token']
    headers2 = {'X-Auth-Token': s_subject_token, 'accept': 'application/json'}
    resp2 = requests.put(url=url2, headers=headers2, data = my_data )
    print resp2

I created a random pandas dataframe using:

df = pd.DataFrame(np.random.randint(0,100,size=(1000000, 4)), columns=list('ABCD'))

saved it to csv

df.to_csv('myPandasData_1000000.csv',index=False)

and then put it to object store

put_file(credentials_1,'myPandasData_1000000.csv')

You can get the credentials_1 object by clicking insert to code -> Insert credentials for any object in your object store.

Sumit Goyal
  • 575
  • 3
  • 16
  • Thank you @Sumit Goyal - and egg on my face, I didn't realize the code sample scrolled and missed the http PUT section of the code. More coffee required in the early morning ... – Ted Morris Feb 09 '17 at 19:27
  • @TedMorris No problem :) – Sumit Goyal Feb 19 '17 at 09:28
  • @SumitGoyal hi, I tried your code but I got this error: KeyError: 'token' from the line `for e1 in resp1_body['token']['catalog']:` do you have any idea how to solve this ? – deltascience May 12 '17 at 12:16