1

I have a tuple generated by Spark after a join. It has a key, two columns in a tuple and then the rest of the columns from the second table. I don't necessarily know how many columns are in the second table.

So, for example:

(2324234534, (('23213','2013/03/02'), 12.32, 32.4, 45))

I have been able to separate the tuple if there is one column after the zip, date tuple like this in PySpark:

x.map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1])))

in Python:

map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1])), x)

This produces the output I am looking for:

(2324234534, ('23213','2013/03/02', 12.32))

If I want more than one column after zip, date then I have this code:

x.map(lambda p: (p[0], (p[1][0][0], p[1][0][1], p[1][1:])))

However, it produces this output:

(2324234534, ('23213','2013/03/02', (12.32, 32.4, 45)))

Either way, my current method is hacky and doesn't produce the result I am looking for. I am interested, perhaps, in learning how to flatten tuples in general (the other threads I have found on this topic turn tuples into lists, which is not 100% what I'm looking for)

Michal
  • 1,863
  • 7
  • 30
  • 50
  • For the general case, this flattener generator looks good to me: http://stackoverflow.com/a/2158532/2337736 It'll work over any iterable, and will return a generator which you can use to create a tuple - `tuple(flatten(some_iterable))`. For your specific question - in your example, you drop two values (2.4 and 45). Is that desired? – Peter DeGlopper Feb 18 '15 at 22:32
  • @PeterDeGlopper I am only able to attain my desired structure when I have just the first value in the output. Otherwise, it prints as a tuple. Also, I tried the code you suggested and it doesn't work for my input. – Michal Feb 18 '15 at 22:36
  • 1
    You'd have to do something like `x.map(lambda p: (p[0], tuple(flatten(p[1]))))` to keep the single level of nesting you want - that works on your test `p = (2324234534, (('23213','2013/03/02'), 12.32, 32.4, 45))`. Or at least it flattens it to `(2324234534, ('23213', '2013/03/02', 12.32, 32.399999999999999, 45))`, I'm not quite sure what you're looking for. – Peter DeGlopper Feb 18 '15 at 22:46
  • @PeterDeGlopper, the second suggestion worked fine after I imported collections. – Michal Feb 18 '15 at 22:50

1 Answers1

1

If I understand your goals correctly, I believe that this does what you want:

In [11]: a = (2324234534, (('23213','2013/03/02'), 12.32, 32.4, 45))

In [12]: a[:1] + (a[1][0] + a[1][1:],)
Out[12]: (2324234534, ('23213', '2013/03/02', 12.32, 32.4, 45))
John1024
  • 109,961
  • 14
  • 137
  • 171