0

Is it possible to extract files only for 3 days, without extracting all the files.

 DROP VIEW IF EXISTS dbo.Read;  
 CREATE VIEW IF NOT EXISTS dbo.Read AS


EXTRACT     
    Statements
FROM  
"adl://Test/{date:yyyy}/{date:M}/{date:d}/Testfile.csv"
USING Extractors.Csv(silent:true,quoting : true, nullEscape : "/N");

@res =
SELECT * FROM dbo.Read
WHERE date BETWEEN DateTime.Parse("2015/07/01") AND DateTime.Parse("2015/07/03");

OUTPUT @res
TO "adl://test/Testing/loop.csv"
USING Outputters.Csv();
Bond
  • 101
  • 1
  • 3
  • 15

2 Answers2

0

You can use the pattern identifiers in the fileset specification in parts of the path or even parts of the name (see https://msdn.microsoft.com/en-us/library/azure/mt771650.aspx). You can do lists of files, so if you only have one file in each directory you can do;

EXTRACT ...
FROM "adl://Test/2015/07/1/Testfile.csv"
     , "adl://Test/2015/07/2/Testfile.csv"
USING ...;

If there is more than one file in each directory you can do individual extracts for each day and then union the result. Something like:

@a = EXTRACT .... 
     FROM "adl://Test/2015/07/1/{*}.csv"
     USING ...;
@b = EXTRACT ....          
     FROM "adl://Test/2015/07/2/{*}.csv"
     USING ...;
@fullset = SELECT * FROM @a UNION SELECT * FROM @b;

Unfortunately I believe there is no list of filesets at the moment allowing you to do above case in one EXTRACT statement.

Bond
  • 101
  • 1
  • 3
  • 15
OmidA
  • 91
  • 3
  • i have 400 files and i can not write 400 extract statements. but if use my above code it will fetch all the file and then subset 400 files based on my start date and end date. – Bond Dec 20 '16 at 05:23
0

Partition elimination already ensures for your query that only files matching predicates will actually be read (you can confirm that in the job graph).

See also my previous answer for How to implement Loops in U-SQL

If you have remaining concerns about performance, the job graph can also help you nail down where they originate.

Community
  • 1
  • 1
  • Thanks for the answer. It is working, but if i use "between", it is fetching all the dates(dates with are not lies between start date and end date), and then filtering it for 400 days. – Bond Dec 20 '16 at 05:26
  • Hi Bond. If you use BETWEEN and the job graph shows you all files (instead of only the ones that fit the predicate, can you please send me the link to the job in email (usql at Microsoft), so we can investigate? – Michael Rys Dec 22 '16 at 05:17