I contacted AWS Support and here are details:
Problem is caused by the files which have a single record. By default Glue crawler used LazySimpleSerde to classify CSV files. LazySimpleSerde needs at least one newline character to identify a CSV file which is its limitation.
The right path to solve this issue is by considering the use of Grok pattern.
In order to confirm this, I have tested some scenarios at my end, with your data and custom pattern. I have created 3 files name file1.csv with one record, file2.csv with two records and file3.csv with one record. Also, proper Grok pattern should consider new lines as well with $ i.e.
%{QUOTEDSTRING:rid:string},%{NUMBER:ts:long}$
- I ran the crawler without any custom pattern on all the files and it created multiple tables.
- I edited the crawler and added the custom pattern and re-ran the same crawler but it still created multiple tables.
- I created a new crawler with Grok pattern and ran it on file1 and file2, it only created one table with proper columns.
- I added file3 and ran the crawler again it only updated the same table and no new tables got created.
- I have tested the scenario 3 and 4 using partitions in S3(as you might have partitioned data) and still got one table.
As per my observations, it seems that the problem might be due to the crawler caching the older classification details. So I'd request you to create a new crawler and point it to a new database in the catalog.