from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("Ark API Stats")
sc = SparkContext(conf=conf)
a = sc.parallelize([1,2,3,4,5,6,7,8,9,10])
count = [2,4]
array = [a.filter(lambda x: x < y) for y in count]
results = sc.union(array).collect()
print(results)
Above code will return [1,2,3,1,2,3]
, whereas what I want is [1,1,2,3]
.
It seems in a.filter(lambda x: x < y)
, the y will always be 4
as the last number in the count
.
Any solutions?