just wonder if it's possible to use s3-dist-cp tool to merge parquet file (snappy compressed). I tried with "--groupBy" and "--targetSize" options and it did merge the small files into bigger files. But I then can't read them within Spark or AWS Athena. In aws athena I got following error:
HIVE_CURSOR_ERROR: Expected 246379 values in column chunk at s3://my_analytics/parquet/auctions/region=us/year=2017/month=1/day=1/output123 offset 4 but got 247604 values instead over 1 pages ending at file offset 39
This query ran against the "randomlogdatabase" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 4ff77c55-3b69-414d-8fd9-a3d135f5ff2f.
Any help is appreciated.