1

Let's say I have the following spark dataframe (df):

enter image description here

As it can be seen, there are duplicate values in the "Timestamp" column, and I want to get rid of them leaving rows where 'Timestamp' has unique values.

I tried to remove the duplicates with this line of code:

df.dropDuplicates(['Timestamp'])

It seems dropDuplicates() retains the first row in the duplicated lines, but I need to have the last row in the duplicate (the ones highlighted in the table). How can this be done?

Braiam
  • 1
  • 11
  • 47
  • 78
M. Mate
  • 47
  • 8
  • Hello M. Mate. Welcome to StackOverflow. Here it was not really needed but in the future, could you post data samples as text instead of as images? This way, people will easily be able to copy and paste them and reproduce your problem. Have a nice day! – Oli May 10 '19 at 12:10

2 Answers2

2

There is a workaround using groupBy and last. We can make it generic by defining a last aggregator on each column but Timestamp.

// let's define the aggregators
val aggs = df.columns
    .filter(_ != "Timestamp")
    .map(c => last(col(c)) as c)
// And use them:
val result = df
    .groupBy("Timestamp")
    .agg(aggs.head, aggs.tail :_*)
Oli
  • 9,766
  • 5
  • 25
  • 46
2

@Oli suggested a nice solution, which I used as follows (using python):

exprs = [last(x).alias(x) for x in df.columns if x != 'Timestamp']
df0 = df.groupBy("Timestamp").agg(*exprs)

Hope this will help people who may get a similar problem

Oli
  • 9,766
  • 5
  • 25
  • 46
M. Mate
  • 47
  • 8
  • It's great that you posted the answer in python you came up with. If you are satisfied with the attention your question received, you may consider accepting an answer (most likely yours) and upvoting the answers that helped you. – Oli May 10 '19 at 15:15
  • 1
    I am pretty sure that you can accept an answer. It's the grey button that looks like a "V" right below the vote buttons of each answer. As for the votes, you need to get 15 reputations. It'll come quickly! – Oli May 10 '19 at 17:29
  • @M.Mate if it allows you to make a question it also allow you to accept the answer! – abiratsis May 10 '19 at 17:43
  • @Oli thanks, I've now accepted your answer and upvoted (this is my first question, so apologies for not doing the 'right' things properly). Cheers! – M. Mate May 13 '19 at 14:06