Let's say your dataframe is ordered by column a
.
You could try performing a self-join on a generated column that reverses the order. Such a column, rn
, could be generated using row_number
eg
Using pyspark api
from pyspark.sql import functions as F
from pyspark.sql import Window
output_df = (
df.withColumn(
"rn",
F.row_number().over(Window.orderBy("a"))
)
.alias("df1")
.join(
df.withColumn(
"rn",
F.row_number().over(Window.orderBy(F.col("a").desc()))
).alias("df2"),
["rn"],
"inner"
)
.selectExpr("df1.a","df1.b","df2.b as c")
)
Using spark sql
select
df1.a,
df1.b,
df2.b as c
from (
select
*,
row_number() over (order by a) rn
from
df
) df1
INNER JOIN (
select
b,
row_number() over (order by a desc) rn
from
df
) df2 on df1.rn=df2.rn;
a |
b |
c |
1 |
1 |
4 |
2 |
2 |
3 |
3 |
3 |
2 |
4 |
4 |
1 |
View on DB Fiddle