16

Azure ML Experiments provide ways to read and write CSV files to Azure blob storage through the Reader and Writer modules. However, I need to write a JSON file to blob storage. Since there is no module to do so, I'm trying to do so from within an Execute Python Script module.

# Import the necessary items
from azure.storage.blob import BlobService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='mykeyhere=='
    json_string='{jsonstring here}'

    blob_service = BlobService(account_name, account_key)

    blob_service.put_block_blob_from_text("upload","out.json",json_string)

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,

However, this results in an error: ImportError: No module named azure.storage.blob

This implies that the azure-storage Python package is not installed on Azure ML.

How can I write to Azure blob storage from inside an Azure ML Experiment?

Here's the fill error message:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,Caught exception while executing function: Traceback (most recent call last):
  File "C:\server\invokepy.py", line 162, in batch
    mod = import_module(moduleName)
  File "C:\pyhome\lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
  File "C:\temp\azuremod.py", line 19, in <module>
    from azure.storage.blob import BlobService
ImportError: No module named azure.storage.blob

---------- End of error message from Python  interpreter  ----------
Start time: UTC 02/06/2016 17:59:47
End time: UTC 02/06/2016 18:00:00`

Thanks, everyone!

UPDATE: Thanks to Dan and Peter for the ideas below. This is the progress I've made using those recommendations. I created a clean Python 2.7 virtual environment (in VS 2005), and did a pip install azure-storage to get the dependencies into my site-packages directory. I then zipped the site-packages folder and uploaded as the Zip file, as per Dan's note below. I then included the reference to the site-packages directory and successfully imported the required items. This resulted in a time out error when writing to blog storage.

Failure to write to Blob storage

Here is my code:

# Get access to the uploaded Python packages    
import sys
packages = ".\Script Bundle\site-packages"
sys.path.append(packages)

# Import the necessary items from packages referenced above
from azure.storage.blob import BlobService
from azure.storage.queue import QueueService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='p8kSy3F...elided...3plQ=='

    blob_service = BlobService(account_name, account_key)
    blob_service.put_block_blob_from_text("upload","out.txt","Test to write")

    # All of the following also fail
    #blob_service.create_container('images')
    #blob_service.put_blob("upload","testme.txt","foo","BlockBlob")

    #queue_service = QueueService(account_name, account_key)
    #queue_service.create_queue('taskqueue')

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,

And here is the new error log:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
data:text/plain,C:\pyhome\lib\site-packages\requests\packages\urllib3\util\ssl_.py:79: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Caught exception while executing function: Traceback (most recent call last):   
  File "C:\server\invokepy.py", line 169, in batch
    odfs = mod.azureml_main(*idfs)
  File "C:\temp\azuremod.py", line 44, in azureml_main
    blob_service.put_blob("upload","testme.txt","foo","BlockBlob")
  File ".\Script Bundle\site-packages\azure\storage\blob\blobservice.py", line 883, in put_blob
    self._perform_request(request)
  File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 171, in _perform_request
    resp = self._filter(request)
  File ".\Script Bundle\site-packages\azure\storage\storageclient.py", line 160, in _perform_request_worker
    return self._httpclient.perform_request(request)
  File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 181, in perform_request
    self.send_request_body(connection, request.body)
  File ".\Script Bundle\site-packages\azure\storage\_http\httpclient.py", line 143, in send_request_body
    connection.send(request_body)
  File ".\Script Bundle\site-packages\azure\storage\_http\requestsclient.py", line 81, in send
    self.response = self.session.request(self.method, self.uri, data=request_body, headers=self.headers, timeout=self.timeout)
  File "C:\pyhome\lib\site-packages\requests\sessions.py", line 464, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\pyhome\lib\site-packages\requests\sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "C:\pyhome\lib\site-packages\requests\adapters.py", line 431, in send
    raise SSLError(e, request=request)
SSLError: The write operation timed out

---------- End of error message from Python  interpreter  ----------
Start time: UTC 02/10/2016 15:33:00
End time: UTC 02/10/2016 15:34:18

Where my current exploration is leading is that there is a dependency on the requests Python package in azure-storage. requests has a known bug in Python 2.7 for calling newer SSL protocols. Not sure, but I'm digging around in that area now.

UPDATE 2: This code runs perfectly fine inside of a Python 3 Jupyter notebook. Additionally, if I make the Blob Container open to public access, I can directly READ from the Container through a URL. For instance: df = pd.read_csv("https://mystorageaccount.blob.core.windows.net/upload/test.csv") easily loads the file from blob storage. However, I cannot use the azure.storage.blob.BlobService to read from the same file.

enter image description here

UPDATE 3: Dan, in a comment below, suggested I try from the Jupyter notebooks hosted on Azure ML. I had been running it from a local Jupyter notebook (see update 2 above). However, it fails when run from an Azure ML Notebook, and the errors point to the requires package again. I'll need to find the known issues with that package, but from my reading, the known issue is with urllib3 and only impacts Python 2.7 and NOT any Python 3.x versions. And this was run in a Python 3.x notebook. Grrr.

enter image description here

UPDATE 4: As Dan notes below, this may be an issue with Azure ML networking, as Execute Python Script is relatively new and just got networking support. However, I have also tested this on an Azure App Service webjob, which is on an entirely different Azure platform. (It is also on an entirely different Python distribution and supports both Python 2.7 and 3.4/5, but only at 32 bit - even on 64 bit machines.) The code there also fails, with an InsecurePlatformWarning message.

[02/08/2016 15:53:54 > b40783: SYS INFO] Run script 'ListenToQueue.py' with script host - 'PythonScriptHost'
[02/08/2016 15:53:54 > b40783: SYS INFO] Status changed to Running
[02/08/2016 15:54:09 > b40783: INFO] test.csv
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   SNIMissingWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   InsecurePlatformWarning
[02/08/2016 15:54:09 > b40783: ERR ] D:\home\site\wwwroot\env\Lib\site-packages\requests\packages\urllib3\util\ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
[02/08/2016 15:54:09 > b40783: ERR ]   InsecurePlatformWarning
cbare
  • 12,060
  • 8
  • 56
  • 63
Steven Borg
  • 641
  • 1
  • 6
  • 14

3 Answers3

5

Bottom Line Up Front: Use HTTP instead of HTTPS for accessing Azure storage.

When declaring BlobService pass in protocol='http' to force the service to communicate over HTTP. Note that you must have your container configured to allow requests over HTTP (which it does by default).

client = BlobService(STORAGE_ACCOUNT, STORAGE_KEY, protocol="http")

History and credit:

I posted a query on this topic to @AzureHelps and they opened a ticket on the MSDN forums: https://social.msdn.microsoft.com/Forums/azure/en-US/46166b22-47ae-4808-ab87-402388dd7a5c/trouble-writing-blob-storage-file-in-azure-ml-experiment?forum=MachineLearning&prof=required

Sudarshan Raghunathan replied with the magic. Here are the steps to make it easy for everyone to duplicate my fix:

  1. Download azure.zip which provides the required libraries: https://azuremlpackagesupport.blob.core.windows.net/python/azure.zip
  2. Upload them as a DataSet to the Azure ML Studio
  3. Connect them to the Zip input on an Execute Python Script module
  4. Write your script as you would normally, being sure to create your BlobService object with protocol='http'
  5. Run the Experiment - you should now be able to write to blob storage.

Some example code can be found here: https://gist.github.com/drdarshan/92fff2a12ad9946892df

The code I used was the following, which doesn't first write the CSV to the file system, but sends as a text stream.

from azure.storage.blob import BlobService

def azureml_main(dataframe1 = None, dataframe2 = None):
    account_name = 'mystorageaccount'
    account_key='p8kSy3FACx...redacted...ebz3plQ=='
    container_name = "upload"
    json_output_file_name = 'testfromml.json'
    json_orient = 'records' # Can be index, records, split, columns, values
    json_force_ascii=False;

    blob_service = BlobService(account_name, account_key, protocol='http')

    blob_service.put_block_blob_from_text(container_name,json_output_file_name,dataframe1.to_json(orient=json_orient, force_ascii=json_force_ascii))

    # Return value must be of a sequence of pandas.DataFrame
    return dataframe1,

Some thoughts:

  1. I would prefer if the azure Python libraries were imported by default. Microsoft imports hundreds of 3rd party libraries into Azure ML as part of the Anaconda distribution. They should also include those necessary to work with Azure. We're in Azure, we've committed to Azure. Embrace it.
  2. I don't like that I have to use HTTP, instead of HTTPS. Granted, this is internal Azure communication, so it's likely no big deal. However, most of the documentation suggests the use of SSL / HTTPS when working with blob storage, so I'd prefer to be able to do that.
  3. I still get random timeout errors in the Experiment. Sometimes the Python code will execute in milliseconds, other times it runs for several 60 or seconds and then times out. This makes running it in an experiment very frustrating at times. However, when published as a Web Service I do not seem to have this problem.
  4. I would prefer that the experience from my local code matched more closely Azure ML. Locally, I can use HTTPS and never time out. It's blazing fast, and easy to write. But moving to an Azure ML experiment means some debugging, nearly every time.

Huge props to Dan, Peter and Sudarshan, all from Microsoft, for their help in resolving this. I very much appreciate it!

Steven Borg
  • 641
  • 1
  • 6
  • 14
1

You are going down the correct path. The Execution Python Script module is meant for custom needs just like this. Your real issue is how to import existing Python script modules. The complete directions can be found here, but I will summarize for SO.

You will want to take the Azure Python SDK and zip it up, upload, then import into your module. I can look into why this is not there by default...

https://azure.microsoft.com/en-us/documentation/articles/machine-learning-execute-python-scripts/

Importing existing Python script modules

A common use-case for many data scientists is to incorporate existing Python scripts into Azure Machine Learning experiments. Instead of concatenating and pasting all the code into a single script box, the Execute Python Script module accepts a third input port to which a zip file that contains the Python modules can be connected. The file is then unzipped by the execution framework at runtime and the contents are added to the library path of the Python interpreter. The azureml_main entry point function can then import these modules directly.

As an example, consider the file Hello.py containing a simple “Hello, World” function.

image6

Figure 4. User-defined function.

Next, we can create a file Hello.zip containing Hello.py:

image7

Figure 5. Zip file containing user-defined Python code.

Then, upload this as a dataset into Azure Machine Learning Studio. If we then create and run a simple experiment a uses the module:

image8

image9

Figure 6. Sample experiment with user-defined Python code uploaded as a zip file.

The module output shows that the zip file has been unpackaged and the function print_hello has indeed been run.   image10 Figure 7. User-defined function in use inside the Execute Python Script module.

Dan Ciborowski - MSFT
  • 6,807
  • 10
  • 53
  • 88
  • Dan, I appreciate the response. It's a great intro into setting things up with a zip file, but only hints at how to address the core issue of not being able to import azure.storage.blob effectively. I did take the actual code from Github and brought it in so I could reference it. This successfully allowed me to reference azure.storage.blob, however, that isn't sufficient because all requests are timing out. I'll address it more in a comment to my original post. But I really do want to say thanks, Dan. This is very helpful, especially for those with similar questions following up later. – Steven Borg Feb 09 '16 at 00:13
  • I don't see an update to the question... But does the container you are trying to upload the blob to exist? Your code will not create the container if it is not already there. Might need to add `blob_service.create_container('mycontainer')`? Hope this might help. – Dan Ciborowski - MSFT Feb 09 '16 at 05:27
  • Thanks again, Dan. I've now updated the question, and noted the issues with more clarity. The container exists, and I can't even create the container from Azure ML. I hope my update makes it clearer. Thanks again for your work. I love Azure and I love Python. I really want to be able to us Python on the Azure PaaS and SaaS services! – Steven Borg Feb 10 '16 at 17:09
  • Can you create a new python 3 notebook in your workspace, and run your code from within the notebook? I am checking to see if this is a network I/O issue with `Execute Python Script`, and it will be good to tell the people I speak with that we know your code worked from a notebook. Thanks – Dan Ciborowski - MSFT Feb 10 '16 at 17:16
  • Yes it works great in a Python 3 Jupyter notebook. Image hosted above. – Steven Borg Feb 10 '16 at 18:24
  • As an aside, I built it first in a Python 2 Jupyter notebook, since Azure ML only supports Python 2. But I tried it in 3.4 64-bit above, and you can see it works. (It also works, in Python 2 (32 and 64 bit) and Python 3 (32 and 64 bit) when run from a local machine -- I haven't yet verified if it will work from an Azure VM, but I'm suspecting that the problem is inter-Azure networking security preventing it from working in Python 2. There's a known issue with the `requests` framework (actually `urllib3`) which `azure-storage` uses.) – Steven Borg Feb 10 '16 at 18:37
  • STOP THE PRESSES! Whoops! I misread and created it from a local Jupyter notebook. Moving it to the Jupyter notebook hosted in Azure ML causes the same problem I'm exploring from the logs... Wish I would have done this FIRST, since spelunking logs is a pain. Note that problem is with the `requests` package. (image added above) – Steven Borg Feb 10 '16 at 18:41
  • Dan, to help with troubleshooting, I should note that I've tried to run this from an Azure App Service, as well. It fails there, too. I have a webjob, and I'll update it above. Sorry, this takes the post far afield, so maybe I should create a separate post in the Azure App Service section, but it belongs here for troubleshooting. – Steven Borg Feb 10 '16 at 18:54
  • 1
    Dan, it was HTTP vs HTTPS. I posted the solution. Thank you for all your help. I'd still love support for SSL inside of Azure, but that's really just a nitpick since we're all inside your very secure network. I can't tell you how thankful I have been for all of your help in this. It was frustrating for me, and your help really got me going in good directions, and gave me a boost of enthusiasm to keep troubleshooting. – Steven Borg Feb 10 '16 at 22:35
1

As I know, you can use other packages via a zip file which you provide to the third input. The comments in the Python template script in Azure ML say:

If a zip file is connected to the third input port is connected, it is unzipped under ".\Script Bundle". This directory is added to sys.path. Therefore, if your zip file contains a Python file mymodule.py you can import it using: import mymodule

So you can package azure-storage-python as a zip file thru click New, click Dataset, and then select From local file and the Zip file option to upload a ZIP file to your workspace.

As reference, you can see more information at the section How to Use Execute Python Script of the doc Execute Python Script.

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • Thanks, Peter. I updated my post to include the problems I'm getting when I use that feature to import arbitrary Python code. Hope that makes it clearer. One thing I question, however, is why Microsoft doesn't simply include the Azure Python packages as part of the default distribution? Seems strange. A somewhat ironic thing is that the azureml Python package built by Microsoft isn't included in Azure ML experiments... :-D Azure ML treats the azureml package as a 3rd party package! – Steven Borg Feb 10 '16 at 17:12
  • network access from the python module is newer, which is why it was previously not included... I am trying to get it put in though :-) – Dan Ciborowski - MSFT Feb 10 '16 at 17:18
  • Dan, I can reach out through the network easily. In fact, I can use `requests` to reach out to other urls. It's just access to Azure storage accounts (and maybe other azure resources) that appears to be blocked. But I agree that it strongly looks like a network issue. As a practice, however, I think that `azure-storage`, and the other azure-specific packages, should be loaded by Microsoft by default. You load hundreds of other 3rd party packages for ease of use, why not your own? – Steven Borg Feb 10 '16 at 18:48
  • Peter, thanks for your answer and help. Ended up being that HTTPS is not supported, and that's the default. If you think the answer is OK, do you mind an upvote? – Steven Borg Feb 10 '16 at 22:33
  • @StevenBorg It's OK. – Peter Pan Feb 11 '16 at 01:17