13

I have a Firehose that stores data in S3 in the default directory structure: YY/MM/DD/HH and a table in Athena with these columns defined as partitions:

year: string, month: string, day: string, hour: string

after running

msck repair table clicks

I only receive:

Partitions not in metastore:    clicks:2017/08/26/10

I can add these partitions manually and everything works however, I was wondering why msck repair does not add these partitions automatically and update the metastore?

Yves M.
  • 29,855
  • 23
  • 108
  • 144
Sam
  • 2,761
  • 3
  • 19
  • 30
  • Possible duplicate of [Hive: Partitioning by part of integer column](https://stackoverflow.com/questions/42950331/hive-partitioning-by-part-of-integer-column) – David דודו Markovitz Aug 26 '17 at 14:24
  • Possible duplicate of https://stackoverflow.com/questions/42785633/create-table-partition-in-hive-for-year-month-and-day/42786102#42786102 – David דודו Markovitz Aug 26 '17 at 14:25
  • @DuduMarkovitz my issue is after running msck repair new partitions are not added automatically, like the posts above show – Sam Aug 26 '17 at 14:27
  • 2
    ... and like my answers show, only a specific directory naming convention, which you are not using, is supported – David דודו Markovitz Aug 26 '17 at 14:30
  • thanks Ill add year, month , day , hour specifically in my directories – Sam Aug 26 '17 at 14:33
  • for to work MSCK your table static partition name should be like 'year=2016/month=10/day=13' then we can use below command and it is working... set hive.msck.path.validation=ignore; MSCK REPAIR TABLE table_name; for more details pls check https://stackoverflow.com/questions/40043986/hdinsight-hive-msck-repair-table-table-name-throwing-error/40100053#40100053 – anand Aug 01 '18 at 11:48

2 Answers2

7

To use Athena MSCK REPAIR with S3 you need to use key-value pairs as path prefix:

clicks/year=2017/month=08/day=26/hour=10/

instead of: clicks/2017/08/26/10/

Alternatively, update the partitions directly in Glue (manually or use a crawler).

Found this here: https://forums.aws.amazon.com/message.jspa?messageID=789078

Yves M.
  • 29,855
  • 23
  • 108
  • 144
Otto
  • 1,787
  • 1
  • 17
  • 25
2

For future reference, aside from the two tips mentioned in this article: https://aws.amazon.com/premiumsupport/knowledge-center/athena-aws-glue-msck-repair-table/

  • Allow glue:BatchCreatePartition in the IAM policy
  • Change the S3 path to flat case

You also need to set the TableType attribute to a non-null value. In my case, it was EXTERNAL_TABLE.

spg
  • 9,309
  • 4
  • 36
  • 41