0

I have a python job which uses beautiful soup to scrape data from the web.I have tried executing the script using U-SQL, however I keep receiving a generic error message :

An unhandled exception from user code has been reported

I haven't explored the error too much as I am not sure if it is possible to scrape the web through U-SQL.

Is this possible using U-SQL, and if not which Azure resource can i use to schedule this script and store the results on Azure data lake store?

  • Hey man, I need to do the exact same thing, I am so lost and clueless on how to achieve this and what tools and systems to use. I have a seperate thread and discussion [here](https://stackoverflow.com/questions/51091813/etl-from-secure-websites-to-sql-database-on-azure) – Wesley Jeftha Jul 01 '18 at 03:24

2 Answers2

0

Hi I'm a PM from the Azure Data Lake team and I'd love to help out with this. I just need some clarification first about what you're trying to do. Could you reach out to me at mabasile(at)microsoft.com with the job ID of the failed job? (Any sensitive information can of course be scrubbed out). That'll be the best way to figure out exactly what you're trying to do and if it's possible on ADL.

Thanks, and I hope to hear from you soon!
Matt Basile
Azure Data Lake Analytics

Update: Confirming Michael Rys's answer - you cannot call external services through U-SQL, because if ADLA scales out to hundreds of vertices and each vertex makes a separate call, you could end up DDOSing the service, so ADLA blocks external calls.

mabasile_MSFT
  • 511
  • 2
  • 4
0

Also, it normally would be helpful if you provided the complete error code and exactly how you want to scrape the web.

I make the random assumption right now that you wrote some code that accessed web pages and tried to run it from within U-SQL. If that is correct, you will get blocked by that the U-SQL container blocks all external network access. For more details why that is done, see the previous answer here.

Michael Rys
  • 6,684
  • 15
  • 23