1
from pyspark import SparkContext, SparkConf


conf = SparkConf().setAppName("Ark API Stats")
sc = SparkContext(conf=conf)


a = sc.parallelize([1,2,3,4,5,6,7,8,9,10])
count = [2,4]
array = [a.filter(lambda x: x < y) for  y in count]

results = sc.union(array).collect()
print(results)

Above code will return [1,2,3,1,2,3], whereas what I want is [1,1,2,3]. It seems in a.filter(lambda x: x < y), the y will always be 4 as the last number in the count. Any solutions?

Xing Shi
  • 2,152
  • 3
  • 21
  • 32

0 Answers0