I have large rdd and I want to create 4 different rdd's out of that based on list of headers provided and save it in impala table by creating 4 parquest files.
like this:
a b c d e f g h
--------------------------------
abc 1 3 4 5 7 9 11
xyz 2 5 7 4 9 4 12
I have list of columns for impala side tables:
table 1 impala side :- a,b,c
table 2 impala side :- d, e, f
...
Also need to add new column for each table for user defined primary key like:
table 1 impala side : - id, a, b, c
Tried with rdd.map function but how to apply for a specific list:
rdd_1 = rdd.map(lambda x: (x['a'],x['b],x['c']))
Also how to add new column with different primary keys ?