I have a tuple generated by Spark after a join. It has a key, two columns in a tuple and then the rest of the columns from the second table. I don't necessarily know how many columns are in the second table.
So, for example:
(2324234534, (('23213','2013/03/02'), 12.32, 32.4, 45))
I have been able to separate the tuple if there is one column after the zip, date tuple like this in PySpark:
x.map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1])))
in Python:
map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1])), x)
This produces the output I am looking for:
(2324234534, ('23213','2013/03/02', 12.32))
If I want more than one column after zip, date then I have this code:
x.map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1:])))
However, it produces this output:
(2324234534, ('23213','2013/03/02', (12.32, 32.4, 45)))
Either way, my current method is hacky and doesn't produce the result I am looking for. I am interested, perhaps, in learning how to flatten tuples in general (the other threads I have found on this topic turn tuples into lists, which is not 100% what I'm looking for)