I have a DataFrame
with only one row
.
df = spark.createDataFrame([(1,2,10,3,4)],['a','b','c','d','e',])
But the number of columns is big, about 20,000
. Now I want select the column
with value larger than a threshold, eg 5
. I try to convert DataFrame
to dict
to count, but meet max Heap size
error.
Here, the expected output is:
+---+
| c|
+---+
| 10|
+---+