1

I want to output a single table into multiple files in U-SQL according to the number of rows.

If my table is having 500 rows, then I have to generate 5 files or 100 rows in each file.

Followed the post, U-SQL Output in Azure Data Lake

Arron
  • 1,134
  • 2
  • 13
  • 32
  • I'd be happy to help. This may be what you're looking for? https://stackoverflow.com/questions/42636855/u-sql-output-in-azure-data-lake/42676271#42676271 – guyhay_MSFT Jul 24 '17 at 20:55
  • But in the post they haven't given the script for "/output/genscript.usql" . – Arron Jul 25 '17 at 05:20
  • The script you see in the post generates that script. You will then have to download it and run it. – Michael Rys Aug 18 '17 at 00:48

1 Answers1

1

In order to generate separate files based on number of rows, you would have to add a ROW_NUMBER() to each row. Then generate a script (for example with U-SQL, see U-SQL Output in Azure Data Lake as an example) that creates an output statement for each of the row regions. Note the script generation file probably uses an inner join with a SELECT COUNT(*) FROM @data to generate the right number of OUTPUT statements. Also you want the first statement in the generated script to be the one that adds the ROW_NUMBER() to the rowset that you then output.

Once you generated the script that does that, you can then download it and submit it.

Michael Rys
  • 6,684
  • 15
  • 23
  • wouldnt the more "correct" approach be to create Jobs at the same unit of work as 1 record ? this way there's no monkeying with row_number function etc? – Alex Gordon Sep 09 '19 at 19:26