9

I'm trying to access a csv file in my Watson Data Platform catalog. I used the code generation functionality from my DSX notebook: Insert to code > Insert StreamingBody object.

The generated code was:

import os
import types
import pandas as pd
import boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share your notebook.

os.environ['AWS_ACCESS_KEY_ID'] = '******'
os.environ['AWS_SECRET_ACCESS_KEY'] = '******'
endpoint = 's3-api.us-geo.objectstorage.softlayer.net'

bucket = 'catalog-test'

cos_12345 = boto3.resource('s3', endpoint_url=endpoint)
body = cos_12345.Object(bucket,'my.csv').get()['Body']

# add missing __iter__ method so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType(__iter__, body)

df_data_2 = pd.read_csv(body)
df_data_2.head()

When I try to run this code, I get:

/usr/local/src/conda3_runtime.v27/4.1.1/lib/python3.5/site-packages/botocore/endpoint.py in create_endpoint(self, service_model, region_name, endpoint_url, verify, response_parser_factory, timeout, max_pool_connections)
    270         if not is_valid_endpoint_url(endpoint_url):
    271 
--> 272             raise ValueError("Invalid endpoint: %s" % endpoint_url)
    273         return Endpoint(
    274             endpoint_url,

ValueError: Invalid endpoint: s3-api.us-geo.objectstorage.service.networklayer.com

What is strange is that if I generate the code for SparkSession setup instead, the same endpoint is used but the spark code runs ok.

How can I fix this issue?


I'm presuming the same issue will be encountered for the other Softlayer endpoints so I'm listing them here as well to ensure this question is also applicable for the other softlayer locations:

  • s3-api.us-geo.objectstorage.softlayer.net
  • s3-api.dal-us-geo.objectstorage.softlayer.net
  • s3-api.sjc-us-geo.objectstorage.softlayer.net
  • s3-api.wdc-us-geo.objectstorage.softlayer.net
  • s3.us-south.objectstorage.softlayer.net
  • s3.us-east.objectstorage.softlayer.net
  • s3.eu-geo.objectstorage.softlayer.net
  • s3.ams-eu-geo.objectstorage.softlayer.net
  • s3.fra-eu-geo.objectstorage.softlayer.net
  • s3.mil-eu-geo.objectstorage.softlayer.net
  • s3.eu-gb.objectstorage.softlayer.net
aL_eX
  • 1,453
  • 2
  • 15
  • 30
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
  • Is that a project which was created with IBM COS as the backing storage, or a project backend by Swift object storage from which you are accessing IBM COS? That information might help to track down the problem in the code generation. – Roland Weber Feb 06 '18 at 06:25
  • It is a cos project. The files were added to my project from the wdp catalog. – Chris Snow Feb 06 '18 at 06:59

2 Answers2

13

The solution was to prefix the endpoint with https://, changing from ...

this

endpoint = 's3-api.us-geo.objectstorage.softlayer.net'

to

endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
  • 1
    Chris, Your solution saved my day even after years of your original reply! I have a web socket API serviced by a lambda. Clients call the API with a URL like `wss://xxxxxx.execute-api.us-east-2.amazonaws.com/teststage`. But when the lambda replies through the API gateway, it has to send the response to `https://xxxxxx.execute-api.us-east-2.amazonaws.com/teststage`. Otherwise the same 'invalid endpoint' error is thrown. – Raja Oct 30 '21 at 09:00
  • THIS is the solution. I searched many forums and SO questions, everybody had just typos in the URL or bad config, but this is the real solution for this problem. Thanks a lot! – Filip Happy Jun 12 '22 at 19:03
1

For IBM Cloud Object Storage, it should be import ibm_boto3 rather than import boto3. The original boto3 is for accessing AWS, which uses different authentication. Maybe those two have a different interpretation of the endpoint value.

Roland Weber
  • 1,865
  • 2
  • 17
  • 27
  • 1
    In you code I see that `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` were set. That would mean that you have a older COS aka Cleversafe COS (S3 API) and not a new IBM COS (+ IAM AUTH). Can you please check that and go to your project settings and take a screenshot of the storage ? – Sven Hafeneger Feb 06 '18 at 09:20