0

I have a Table loaded in Dataframe and I tried to use groupBy with PKs.

df_remitInsert = spark.sql("""SELECT * FROM trac_analytics.mainremitdata""")
df_remitInsert_filter = df_remitInsert.groupBy("LoanID_Serv", "LoanNumber", "Month").count().filter("count > 1").drop('count')

where, "LoanID_Serv", "LoanNumber", "Month" are my Primary Keys.

I want to achieve entire data from df_remitInsert which are deduplicated w.r.t Primary Keys.

Crime_Master_GoGo
  • 1,641
  • 1
  • 20
  • 30

1 Answers1

0

You can use the dropDuplicates method.

df_remitInsert_filter = df_remitInsert.dropDuplicates(['LoanID_Serv', 'LoanNumber', 'Month'])
过过招
  • 3,722
  • 2
  • 4
  • 11