Recently Azure announced Data Lake Gen 2 preview. As far as I know the main difference between Gen 1 and Gen 2 (in terms of functionality) is the Object Store and File System access over the same data at the same time. Other differences would be the price, available location etc. Can anyone explain what are the other key differences between Gen 1 and Gen 2?
-
1ADLS Gen 1 retirement is announced by Microsoft, FAQs including migration procedures can be found here: https://learn.microsoft.com/answers/answers/281143/view.html – Shehan Weerasooriya Jun 21 '21 at 04:33
6 Answers
Basically, think of gen2 as a superset of gen1 plus all of the best parts of blob storage: tiers, HDFS and object store API's and presumably the ability to efficiently handle the management of over 35K files and efficiently dealing with many small sizes and more trickle write type operations.. plus its cheaper.
I'm trying to get some clarity on a few specifics but not finding much in the meantime try these links:
https://azure.microsoft.com/en-us/blog/a-closer-look-at-azure-data-lake-storage-gen2/
https://learn.microsoft.com/en-us/azure/storage/data-lake-storage/introduction

- 3,630
- 3
- 23
- 29
Azure data lake storage Gen2 is a super set of Azure data lake Gen 1. It also called as a "no-compromise data lake" by Microsoft. Gen 2 extends Azure blob storage capabilities and it is best optimized for analytics workloads. It can store data once and access via existing blob storage and HDFS-compliant file system interfaces with no programming changes or data copying when doing database operations since it supports atomic file and folder operations.
At present, it is only available in West US 2 and West Central US data centers. But it will be expanded into other data centers in the near future according to Microsoft.

- 83
- 8

- 768
- 2
- 6
- 23
There is a Microsoft doc that talks about the the differences. For Example:
Data Organization:
Gen1
- Hierarchical namespace, File and folder support.
Gen2
- Hierarchical namespace, container, file and folder support
Geo-redundancy:
Gen1
- LRS.
Gen2
- LRS, ZRS, GRS, RA-GRS.
Ecosystem:
Gen1
- HDInsight (3.6), Azure Databricks (3.1 and above), SQL DW, ADF
Gen2
- HDInsight (3.6, 4.0), Azure Databricks (5.1 and above), SQL DW, ADF

- 190
- 11
Adding to below differences, when using ADF to connect to Azure data lake analytics storage account we need to choose Gen1 for Linked Service and for blob or storage account we need to choose Gen2.

- 55
- 1
- 7
Main difference is of U-SQL(Gen1) and T-SQL(Gen2).
The difference between U-SQL & T-SQL is that PolyBase extends T-SQL onto unstructured data (files) via a schematized view that allows writing T-SQL against these files, while U-SQL natively operates on unstructured data and virtualizes access to other SQL data sources via a built-in EXTRACT expression that allows you to schematize unstructured data on the fly without having to create a metadata object for it.
In addition of it, Gen2 also supports ZRS, GRS, RA-GRS along with LRS.

- 89
- 1
- 2
Azure gen1 interact with Hdfs. It supports for few reasons but storage account support all regions and both r integrated and Microsoft release new version called zen2 Zen2 it's a combination of blob storage and zen1 That means zen2 build on top of azure blob storage If u want to create zen2 account need to go for a storage account go to advance enable gen2

- 768
- 2
- 6
- 23