1

Hi I am trying to begin an Azure ML algorithm by executing a python script that queries data from a table storage account. I do it using this:

entities_Azure=table_session.query_entities(table_name=table_name, 
                                                filter="PartitionKey eq '" + partitionKey + "'",
                                                select='PartitionKey,RowKey,Timestamp,value',
                                                next_partition_key = next_pk,
                                                next_row_key = next_rk, top=1000)  

I pass in the variables needed when calling the function that this bit of code sits in, and I include the function by including a zip file in Azure ML.

I assume the error is due to the query taking too long, or something like that, but it has to take a long time because I might have to query loads of data.... I looked at this SO post Windows Azure Storage Table connection timed out which is a similar issue I think with regard to hitting specified thresholds for these queries, but I don't know how I'd be able to avoid it. The run time of the program is only about 1.5 mins before timing out..

Any ideas as to why this is happening and how I might be able to solve it?

Edit:

As per Peter Pan - MSFT's advice I ran a query that was more specific:

entities_Azure=table_service.query_entities(table_name='#######',select='PartitionKey,RowKey,Timestamp,value', next_partition_key = None, next_row_key = None, top=2)

This returned the following error log:

Error 0085: The following error occurred during script evaluation, please view the output log for more information:  
---------- Start of error message from Python interpreter ----------  
data:text/plain,Caught exception while executing function: Traceback (most recent call last):    

File "C:\server\invokepy.py", line 169, in 
batch odfs = mod.azureml_main(*idfs)    

File "C:\temp\azuremod.py", line 61, in 
azureml_main entities_Azure=table_service.query_entities(table_name='######',select='PartitionKey,RowKey,Timestamp,value', next_partition_key = None, next_row_key = None, top=2)    

File "./Script Bundle\azure\storage\table\tableservice.py", line 421, in query_entities
 response = self._perform_request(request)    

File "./Script Bundle\azure\storage\storageclient.py", line 171, in _perform_request
 resp = self._filter(request)    

File "./Script Bundle\azure\storage\table\tableservice.py", line 664, in _perform_request_worker
 return self._httpclient.perform_request(request)    

File "./Script Bundle\azure\storage\_http\httpclient.py", line 181, in perform_request
 self.send_request_body(connection, request.body)    

File "./Script Bundle\azure\storage\_http\httpclient.py", line 145, in send_request_body
 connection.send(None)    

File "./Script Bundle\azure\storage\_http\requestsclient.py", line 81, in send
 self.response = self.session.request(self.method, self.uri, data=request_body, headers=self.headers, timeout=self.timeout)    

File "C:\pyhome\lib\site-packages\requests\sessions.py", line 456, in request
 resp = self.send(prep, **send_kwargs)    

File "C:\pyhome\lib\site-packages\requests\sessions.py", line 559, in send
 r = adapter.send(request, **kwargs)    

File "C:\pyhome\lib\site-packages\requests\adapters.py", line 382, in send
 raise SSLError(e, request=request) 

SSLError: The write operation timed out    
---------- End of error message from Python  interpreter 
---------- Start time: UTC 11/18/2015 11:39:32 End time: UTC 11/18/2015 11:40:53

Hopefully this brings more insight to the situation!

Community
  • 1
  • 1
HStro
  • 145
  • 1
  • 11
  • Two points need confirmed from you:1. Did you try to run this code about called table service successful on your local dev environment?2. Could you please try this code like "table_service.query_entities(table_name='people', filter="PartitionKey eq 'Smith' and RowKey eq 'Jeff'", select='PartitionKey,RowKey')" rather than your code above. – Will Shao - MSFT Nov 25 '15 at 02:36
  • Hi, sorry for the delay. I tried a query with the structure that you gave and got the same error! The queries work perfectly in my local dev environment, it only breaks when I try to run it from within the azure ML workspace... – HStro Dec 04 '15 at 18:55

2 Answers2

1

I tried to fill a table storage with data generated by myself and want to reproduce your issue thru doing a query like yours, but failed.

I found table storage query timeout issue for REST API (azure storage sdk for python wrapped REST API). The page (https://msdn.microsoft.com/en-us/library/azure/dd894042.aspx) "Query Timeout and Pagination" for Table Service REST API said:

A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation.

Note that the total time allotted to the request for scheduling and processing the query is 30 seconds, including the five seconds for query execution.

It is possible for a query to return no results but to still return a continuation header.

I think the issue was caused by hitting these sepcified thresholds.

Also, I used the module Reader in Data Input and Output and set the data source with Azure Table to read 1000 entites successfully and fast on the Experiment of Azure ML Studio.

enter image description here

enter image description here

For this scenario, I suggest you can use the specified query filter to query your table storage, such as the following:

entities_Azure=table_session.query_entities(table_name=table_name,
      filter="PartitionKey eq '" + partitionKey + "' and Rowkey eq '" + rowkey + "'",
      select='PartitionKey,RowKey,Timestamp,value',
      next_partition_key = next_pk,
      next_row_key = next_rk, top=1000) 

We can use this code to judge the problem is connection issue or thresholds issue.

Any concern, please feel free to let me know.

Community
  • 1
  • 1
Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • Firstly huge thanks for going to the trouble of trying to recreate the error! I really appreciate it. I tried the Reader, but the issue is in how the data gets uploaded and sorted in the Table. It is ordered by partition key so taking the last N rows would most likely end up with N rows of only one partition key entry. – HStro Nov 18 '15 at 11:00
  • Continuation: I wrote this code in Spyder first, and used the next_row/partition key attributes to get more than 1000 entities. This was ok over there but it seems like the 30 second limit might be causing the problem in Azure ML, could this be correct? How would I pass that? I am trying the specified query now as you advised. Thanks! – HStro Nov 18 '15 at 11:01
  • I ran the code and included a run down of what I did/what happened in the original question. Does that help? Still getting the Timeout! – HStro Nov 18 '15 at 12:29
0

I ran into a very similar problem in Access Azure blog storage from within an Azure ML experiment. I didn't realize they were similar when I first posted. However, it became very clear as the debugging and help continued.

Bottom Line: SSLError with Timeout occurs when azure.storage.* is accessed over HTTPS/SSL. If you change the creation of the 'TableService' to force the use of HTTP (protocol='http') the timeout errors will cease.

table_service = TableService(account_name='myaccount', account_key='mykey',protocol='http')

The full analysis can be found at the StackOverflow post above. However, I saw this and felt I should mention it directly here to help with searching. The fix applies to azure.storage.table, azure.storage.blob, azure.storage.page and azure.storage.queue.

PS. Yes, I know that using HTTP isn't optimal, however, you are running everything here within Azure. And when you leave Azure ML (or Azure App Service) you can switch back to HTTPS.

Community
  • 1
  • 1
Steven Borg
  • 641
  • 1
  • 6
  • 14