To do this, group by the "id", then collect the lists from both "col1" and "col2" in an aggregation, to then explode it again into one column.
To get the unique numbers, just drop the duplicates after.
I see that you also have the numbers sorted in your end result, this is done by sorting the concatted lists in the aggregation.
The following code:
from pyspark.sql.functions import concat, collect_list, explode, col, sort_array
df = (
sc.parallelize([
('A', 2, 3), ('A', 2, 4), ('A', 4, 6),
('B', 1, 2),
]).toDF(["id", "col1", "col2"])
)
result = df.groupBy("id") \
.agg(sort_array(concat(collect_list("col1"),collect_list("col2"))).alias("all_numbers")) \
.orderBy("id") \
.withColumn('number', explode(col('all_numbers'))) \
.dropDuplicates() \
.select("id","number") \
.show()
will yield:
+---+------+
| id|number|
+---+------+
| A| 2|
| A| 3|
| A| 4|
| A| 6|
| B| 1|
| B| 2|
+---+------+