I have read several other posts about this and in particular this question with an answer by greg about how to do it in Hive. I would like to know how to account for DynamoDB tables with variable amounts of columns though?
That is, the original DynamoDB table has rows that were added dynamically with different columns. I have tried to view the exportDynamoDBToS3 script that Amazon uses in their DataPipeLine service but it has code like the following which does not seem to map the columns:
-- Map DynamoDB Table
CREATE EXTERNAL TABLE dynamodb_table (item map<string,string>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "MyTable");
(As an aside, I have also tried using the Datapipe system but found it rather frustrating as I could not figure out from the documentation how to perform simple tasks like run a shell script without everything failing.)