Questions tagged [azure-data-lake]

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in Hive, Pig, Spark, Storm, and U-SQL.

Azure Data Lake Analytics is a suite of three big data services in Microsoft Azure: HDInsight, Data Lake Store, and Data Lake Analytics. These fully managed services make it easy to get started and easy to scale big data jobs written in, U-SQL, Apache Hive, Pig, Spark, and Storm.

  • HDInsight is a fully managed, monitored and supported Apache Hadoop service, bringing the power of Hadoop clusters to you with a few clicks.
  • Data Lake Store is a cloud scale service designed to store all data for analytics. The Data Lake Store allows for petabyte sized files, and unlimited accounts sizes, surfaced through an HDFS API enabling any Hadoop component to access data. Additionally, date in Data Lake Store is protected via ACL's that can be tied to an OAuth2 based identity, including those from your on-premises Active Directory.
  • Data Lake Analytics is a distributed service built on Apache YARN that dynamically scales on demand while you only pay for the job that is running. Data Lake Analytics also includes U-SQL, a language designed for big data, keeping the familiar declarative syntax of SQL, easily extended with user code authored in C#.

To learn more, check out: https://azure.microsoft.com/en-us/solutions/data-lake/

1870 questions
24
votes
7 answers

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable("SomeData") I get the following error: "Can not create the managed table('SomeData').…
22
votes
2 answers

How to choose between Azure data lake analytics and Azure Databricks

Azure data lake analytics and azure databricks both can be used for batch processing. Could anyone please help me understand when to choose one over another?
Pragmatic
  • 3,093
  • 4
  • 33
  • 62
21
votes
6 answers

Azure Data Lake Gen 1 vs Gen 2

Recently Azure announced Data Lake Gen 2 preview. As far as I know the main difference between Gen 1 and Gen 2 (in terms of functionality) is the Object Store and File System access over the same data at the same time. Other differences would be…
Shehan Weerasooriya
  • 768
  • 2
  • 6
  • 23
20
votes
8 answers

Azcopy error "This request is not authorized to perform this operation."

I copied a container to another storage account based on the document linked below. (DataLake Storage Gen2). When trying, I got the following error: this request not authorized to perform this operations using this…
TA Hyouno
  • 377
  • 1
  • 2
  • 8
18
votes
2 answers

Difference between Azure Data Lake Storage x Azure Blob Storage and Azure File Storage

I have a question about the use cases of the different Azure storage services: Azure Data Lake Storage. Azure Blob Storage. Azure File Storage. what is the difference between these services? and when to use them since they all provide the same…
I.Chorfi
  • 507
  • 2
  • 5
  • 12
14
votes
2 answers

Writing log with python logging module in databricks to azure datalake not working

I'm trying to write my own log files to Azure Datalake Gen 2 in a Python-Notebook within Databricks. I'm trying to achieve that by using the Python logging module. Unfortunately I can't get it working. No errors are raised, the folders are created…
Dominik Braun
  • 191
  • 1
  • 1
  • 5
14
votes
5 answers

Azure Data lake VS Azure HDInsight

I was going through the Microsoft documents: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview I'm new to Azure Data lake and HDInsight. There is a statement in the URL which tells that "Azure Data Lake Store can be…
AskMe
  • 2,495
  • 8
  • 49
  • 102
12
votes
1 answer

Can't Find Data Lake Store Gen2

I'm trying to locate Azure DataLake Store Gen2 using the Azure portal and for some reason cannot find it: I've been searching the docs and the portal and cannot seem to find it, has anyone else run into this problem? It has been in global GA since…
C.Nivs
  • 12,353
  • 2
  • 19
  • 44
12
votes
2 answers

Moving a DocumentDB Collection to Azure Data Lake Storage

I was wondering what's the best practice moving a documentDB to the Azure Data Lake Storage. Should I create a file for each document in a collection or move the entire documentDB? Also I didn't find much information on how I can access the…
reachify
  • 3,657
  • 2
  • 19
  • 22
11
votes
1 answer

Azure Spark SQL vs U-SQL

I have a lot of data files that will be eventually be pushed and stored on the Azure Storage/Data Lake at a regular interval of time. I want to provide an ability to do Analytic on this data but then I see that on Azure there are two approach:…
Kiran
  • 2,997
  • 6
  • 31
  • 62
10
votes
3 answers

List All Files in a Folder Sitting in a Data Lake

I'm trying to get an inventory of all files in a folder, which has a few sub-folders, all of which sit in a data lake. Here is the code that I'm testing. import sys, os import pandas as pd mylist = [] root = "/mnt/rawdata/parent/" path =…
ASH
  • 20,759
  • 19
  • 87
  • 200
10
votes
3 answers

how to change Data Factory in Microsoft Integration Runtime COnfiguration Manager?

I have Installed Microsoft Integration Runtime configuration Manager When I have Migrated Data from On-Premise SQL Server to Azure Data Lake and when I'm trying to use for another Azure Data Factory I don't find a space to add new key for the data…
Saranraj K
  • 412
  • 1
  • 7
  • 19
9
votes
2 answers

What is hierarchical namespace in Microsoft Azure Data Lake storage (Gen2)?

I read Microsoft's document regarding it. link -> https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace. But unable to understand it clearly. Can anyone please help me to understand it in layman term / simple language?…
nomadSK25
  • 2,350
  • 3
  • 25
  • 36
9
votes
4 answers

Visual Studio 2019 MPF 15.0 is missing

VS 2019 Preview 1 is just released but I am getting this MPF 15.0 error. This was happened before with VS 2017 and 2015. So, we are not able to update some extensions even if we download from Microsoft marketplace. Do you have any suggestions?
fatihyildizhan
  • 8,614
  • 7
  • 64
  • 88
9
votes
1 answer

Azure Databricks vs ADLA for processing

Presently, I have all my data files in Azure Data Lake Store. I need to process these files which are mostly in csv format. The processing would be running jobs on these files to extract various information for e.g.Data for certain periods of dates…
Jobi
  • 93
  • 1
  • 3
1
2 3
99 100