2

Is it possible to share information (such as credentials) across multiple notebooks in a DSX project, e.g. with environment variables?

For example a Cloud Foundry application in Bluemix has a control setting where environment variables can be defined, is there a similar concept for a DSX project (I couldn't see anything in the various project level settings).

2 Answers2

2

Separate notebooks have separate runtimes in the background and at the moment it is not possible to share credentials among notebooks by defining environment variables. But there are helper methods for most obvious credential requirements in a project. This is called the "Insert to code" method.

For example: if you have an object store associated with your project.

  1. Select the "Data" tab in the top bar.
  2. Add some file to the object store by browsing or simple drag-n-drop.
  3. Insert credentials of that object store container in your notebook by selecting the "Insert credentials" option, right besides your file in the right hand side panel.
  4. You can then directly insert those credential (Step 3) in any other notebook in that project.

Besides "Insert to code" there are other helper functions like "Insert SparkR dataframe", "Pandas dataframe" etc. to speed up the analytics process of data scientists. Hope that was a bit helpful.

Sumit Goyal
  • 575
  • 3
  • 16
2

FYI - I've added a feature request on uservoice to allow Bluemix services to be bound to a project and then the credentials be accessed in the same way a Bluemix application accessess credentials. Please vote if you think this would be useful.


Currently, one pattern I use quite a lot is to create a notebook in my project that is used to save credentials to a file on DSX:

! echo '{ "username": "xxxx", "password": "xxxx", ... }'  > cloudant_creds.json

That file is now available to all of your notebooks on the project. NOTE: the file is saved on the spark service file system. If you use the same spark service in other dsx projects, they will also be able to access the file.

The credentials for cloudant normally include other fields such as host, I haven't shown these fields here so I can Keep the example simple. I have indicated there are more fields with the .... I normally copy this json from the bluemix service credentials field.

In your other notebooks, you would read the credentials something like this:

with open('cloudant_creds.json') as data_file:    
    sourceDB = json.load(data_file)

You can then refer the credentials like this:

    dfReader = sqlContext.read.format("com.cloudant.spark")
    dfReader.option("cloudant.host", sourceDB.host)

    if sourceDB.username:
        dfReader.option("cloudant.username", sourceDB.username)

    if sourceDB.password:
        dfReader.option("cloudant.password", sourceDB.password)

    df = dfReader.load(sourceDB.database).cache()
Chris Snow
  • 23,813
  • 35
  • 144
  • 309
  • Thanks, that's exactly what my end goal was, connecting to IBM Bluemix Streaming Analytic service, I've voted for the feature request. – Dan Debrunner Jan 21 '17 at 18:32
  • That will also make your notebook work only if that file exists in that location, so you are trading off reproducibility in favor of not having the credentials in a notebook code cell. Whether that's good or bad, depends on your use case (just wanted to mention here for others reading this). – Philipp Langer Jan 25 '17 at 07:43
  • I agree that sometimes it will be better to embed the credentials especially if you are only working in a small team. However, even having hard coded credentials in a notebook doesn't give you a reproducible environment. If you download, share or save a notebook to git, and you have credentials hardcoded in a notebook cell, you would most likely want the cell to be a hidden cell which means you won't be able to reproduce the environment anyway? – Chris Snow Jan 25 '17 at 08:35