I'm not a pro with spark, so ask for help.
I made a migration from DynamoDB table into S3 with built in service. It saves files in *.json format. Let's say below we have an example of a row (each rows data is a dict nested under key "Item").
{
"Item": {
"accept_languages": {
"M": {
"en": {"N": "0.9"},
"en-US": {"N": "1"}
}
},
"accept_mimetypes": {
"M": {
"*/*": {"N": "0.8"},
"image/*": {"N": "1"},
"image/apng": {"N": "1"},
"image/webp": {"N": "1"}
}
},
"id": {"S": "5cddbd53b870c2619f1083ed"},
"ip": {"S": "11.11.111.11"},
"landing_page__type": {"S": "PageMain"},
"location__city": {"S": "Scituate"},
"location__country": {"S": "United States"},
"location__country_code": {"S": "US"},
"location__region": {"S": "MA"},
"location__zip": {"S": "02066"},
"origin_url": {"S": "https://www.bing.com/"},
"session": {"S": "b4d58fd18"},
"source": {"S": "bing"},
"user_agent__browser": {"S": "Chrome"},
"user_device": {"S": "t"}
}
}
As we see each rows data is nested. I want to create a *.csv file as a result from it. Any recommendations how I can parse it? Currently I have a UDF (custom function) to transform a dict itself from DynamoDB to regular view. How can I extract data from each row and apply that function to it, for example.
Thanks