1

I have the following dataframe

id   col1   col2  col3   col4
1    1      10    100    A    
1    1      20    101    B
1    1      30    102    C
2    1      10    80     D
2    1      20    90     E
2    1      30    100    F
2    1      40    104    G

So, I want to return a new dataframe, in which I can have in olnly one row the values for the same (col1, col2), and also create a new column with some oeration over both col3 columns, for example

    id(1) col1(1) col2(1) col3(1) col4(1) id(2) col1(2) col2(2) col3(3) col4(4) new_column 
    1       1       10      100     A      2       1       10     80    D       (100-80)*100
    1       1       20      101     B      2       1       20     90    E       (101-90)*100 
    1       1       30      102     C      2       1       30     100   F       (102-100)*100  
    -       -       -        -      -      2       1       40     104   G        -

I tried ordering, grouping by (col1, col2) but the grouping returns a RelationalGroupedDataset that I cannot do anything appart of aggregation functions. SO I will appreciate any help. I'm using Scala 2.11 Thanks!

Sebastian
  • 13
  • 3

1 Answers1

0

what about joining the df with itself? something like:

df.as("left")
  .join(df.as("right"), Seq("col1", "col2"), "outer")
  .where($"left.id" =!= $"right.id")
lev
  • 3,986
  • 4
  • 33
  • 46