I have a data in Spark RDD and I want to divide it into two part with a scale such as 0.7. For example if the RDD looks like this:
[1,2,3,4,5,6,7,8,9,10]
I want to divide it into rdd1
:
[1,2,3,4,5,6,7]
and rdd2
:
[8,9,10]
with the scale 0.7. The rdd1
and rdd2
should be random every time. I tried this way:
seed = random.randint(0,10000)
rdd1 = data.sample(False,scale,seed)
rdd2 = data.subtract(rdd1)
and it works sometimes but when my data contains dict
I experienced some problems. For example with data as follows:
[{1:2},{3:1},{5:4,2;6}]
I get
TypeError: unhashable type: 'dict'