58

I have external tables created in AWS Athena to query S3 data, however, the location path has 1000+ files. So I need the corresponding filename of the record to be displayed as a column in the table.

select file_name , col1 from table where file_name = "test20170516"

In short, I need to know INPUT__FILE__NAME(hive) equivalent in AWS Athena Presto or any other ways to achieve the same.

DJo
  • 2,133
  • 4
  • 30
  • 46
Rajeev
  • 1,031
  • 2
  • 13
  • 25

2 Answers2

96

You can do this with the $path pseudo column.

select "$path" from table
jens walter
  • 13,269
  • 2
  • 56
  • 54
  • 1
    Many thanks for the info.Please can you let me know any link pertaining to this as I was searching for a while and couldn't find any. – Rajeev May 17 '17 at 16:27
  • 3
    I found it a while back through the following github issue. https://github.com/prestodb/presto/issues/5486 – jens walter May 17 '17 at 16:50
  • 7
    Thanks Again. Are there any other pseudo columns like this that you are aware of? – Rajeev May 17 '17 at 20:55
  • 2
    See also answer from official AWS docs: "How can I see the Amazon S3 source file for a row in an Athena table": https://aws.amazon.com/en/premiumsupport/knowledge-center/find-s3-source-file-athena-table-row/ – Sma Ma Apr 07 '20 at 12:46
  • FYI, this works in presto-sql version 316 (and probably earlier). So, it's a presto feature not just an Athena feature. Didn't know about it, but it's awesome to have - thanks! – John Humphreys May 13 '20 at 17:13
  • Although one may not run into this complication often as partitioning should be used pretty much all the time, it's worth noting that the $path variable is not available in non-partitioned tables – Harfel Jaquez Jul 13 '20 at 16:38
27

If you need just the filename, you can extract it with regeexp_extract().

To use it in Athena on the "$path" you can do something like this:

SELECT regexp_extract("$path", '[^/]+$') AS filename from table;

If you need the filename without the extension, you can do:

SELECT regexp_extract("$path", '[ \w-]+?(?=\.)') AS filename_without_extension from table;

Here is the documentation on Presto Regular Expression Functions

campeterson
  • 3,591
  • 2
  • 25
  • 26
  • 3
    I was using split_part("$PATH",'/',7) to get the filename,yours is more generic though.Thanks. – Rajeev Sep 19 '17 at 21:53