12

I was wondering what's the best practice moving a documentDB to the Azure Data Lake Storage. Should I create a file for each document in a collection or move the entire documentDB? Also I didn't find much information on how I can access the documentDB using U-SQL?

Input would be appreciated.

chiragMishra-msft
  • 192
  • 2
  • 4
  • 30
reachify
  • 3,657
  • 2
  • 19
  • 22
  • I guess I need to ask why you want to do this? DocDB and ADL are different tools for different purposes. Is this a permanent move. Or do you just want to copy the data to ADL for reporting? If so, have you considered using tables within the data lake analytics service? – Paul Andrew May 05 '17 at 08:32
  • To simplify it I have a number of log files that is sent to ADL. I also have a documentDB that contains additional information to each file (dont ask me why but that's how its set up). So each log file has a matching documentdb document. The documentDB is stored in ADL as a json file. I can query the json file but it seems I cant store it as a documentDB and query this directly (which would be better as its indexed). – reachify May 05 '17 at 13:17

2 Answers2

1

You currently cannot use U-SQL to access data in DocumentDB (or now called CosmosDB). There is a feature request here. Please feel free to add your vote.

If you move the data over, the organization depends on how you want to manage the data (delete all, or only parts?), how it is structured (keep similar structured data together, either in same file or same folder) and how you use it (always need all of it? or only parts?) and what gives you the best performance accessing it (larger files are normally better, but if they are JSON, also make sure the extraction process works).

Michael Rys
  • 6,684
  • 15
  • 23
0

You can use Azure Data Factory to connect to Document DB and store your data on Data Lake. After that you can query the data directly from Data Lake using U-SQL.

Jorge Ribeiro
  • 1,128
  • 7
  • 17