how to add a new column having the value of the following line with scala dataframe

Question

I have a DATAFRAME

+----------+----------+
| longitude|  latitude|
+----------+----------+
|-7.1732833|32.0414966|
|-7.1732844|32.0414406|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|
|-7.1732833|32.0414966|

expected result

 +----------+----------+-----------------+----------------------+----------------+-------------+
| longitude|  latitude| origin_longitude |destination_longitude|origine_latitude|destination_latitude
+----------+----------+ -----------------+---------------------+----------------+
|-7.1732833|32.0414966|-7.1732833        |-7.1732844           |32.0414966      |32.0414406
|-7.1732844|32.0414406|-7.1732844        |-7.1732833           |32.0414406      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|-7.1732833        |-7.1732833           |32.0414966      |32.0414966
|-7.1732833|32.0414966|

how can i do that with scala , I'm new in scala , please help . Thank you .

Does this answer your question? [Append a column to Data Frame in Apache Spark 1.3](https://stackoverflow.com/questions/29483498/append-a-column-to-data-frame-in-apache-spark-1-3) — Jay Kakadiya, Feb 27 '20 at 01:21

score 1 · Accepted Answer · answered Feb 27 '20 at 02:16

You can use window function to get the next(lead) row and make a new column with, however, lead requires us to use an orderBy, and your dataframe order will not be preserved if I were to orderBy on latitude/longitude, therefore, I created a seq column manually, to preserve your order. In your real data you should have a column with helps you order.

%scala
val df=Seq(
       (1,-7.1732833,32.0414966),
       (2,-7.1732844,32.0414406),
       (3,-7.1732833,32.0414966),
       (4,-7.1732833,32.0414966),
       (5,-7.1732833,32.0414966),
       (6,-7.1732833,32.0414966)
        ).toDF("seq","longitude","latitude")

df.show()

+---+----------+----------+
|seq| longitude|  latitude|
+---+----------+----------+
|  1|-7.1732833|32.0414966|
|  2|-7.1732844|32.0414406|
|  3|-7.1732833|32.0414966|
|  4|-7.1732833|32.0414966|
|  5|-7.1732833|32.0414966|
|  6|-7.1732833|32.0414966|
+---+----------+----------+


import org.apache.spark.sql.functions.lead 
import org.apache.spark.sql.functions.col 

val w = org.apache.spark.sql.expressions.Window.orderBy("date").orderBy("seq")

df.withColumn("destination_longitude", lead("longitude",1,0).over(w)).withColumn("destination_latitude", lead("latitude",1,0).over(w)).select(col("longitude").alias("origin_longitude"),col("destination_longitude"),col("latitude").alias("origin_latitude"),col("destination_latitude")).filter(col("destination_longitude")!==0.0).show()

+----------------+---------------------+---------------+--------------------+
|origin_longitude|destination_longitude|origin_latitude|destination_latitude|
+----------------+---------------------+---------------+--------------------+
|      -7.1732833|           -7.1732844|     32.0414966|          32.0414406|
|      -7.1732844|           -7.1732833|     32.0414406|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
|      -7.1732833|           -7.1732833|     32.0414966|          32.0414966|
+----------------+---------------------+---------------+--------------------+

Thanks a lot but when I tried that I get this error : Error:(25, 333) value !== is not a member of org.apache.spark.sql.Column scala. Any idea ! — guernine khadija, Feb 27 '20 at 08:13

score 0 · Answer 2 · answered Feb 27 '20 at 02:10

0

You could use df.withColumn("origin_longitude",lit(-7.1732833)) you can chain as many withColumn function as needed.

answered Feb 27 '20 at 02:10

Suhas NM

960
7
10

The question is not for the withColumn but how to get the next row value into this row. – Lamanus Feb 27 '20 at 06:41

how to add a new column having the value of the following line with scala dataframe

2 Answers2