0

I am wondering whether it is possible to use another dataframe/table as a mapping table. Here is the best was I can describe it.

dfA (Original dataset)

Below is the original dataset. This is the dataset that we want to change based off a lookup

Name TransactionID Alert flag
John Smith 93818 Bad Math
Jane Doe 91982 Invalid Row

dfB (Lookup table)

Below is the table which will have references to the alert flag and what each alert flag should equal.

Alert_flag_OLD Alert_flag_NEW
Bad Math Incorrect Calculations
Invalid Row Invalid Transaction

What I want to know is. Can we perform a look up. Something like

dfA = (
 dfA
 .withColumn('Alert_Flag', LOOKUP on dfB. if (dfA.ALERT_FLAG) in dfB.ALERT_FLAG_OLD then VALUE = ALERT_FLAG_NEW

the if (ALERT_FLAG) is just saying if the alert flag exists in dfB then use find it's new value.

Is something like this possible?

Have not tried anything as of yet as I wouldn't really know how to start it

AminMal
  • 3,070
  • 2
  • 6
  • 15
  • does this responds to your question? https://stackoverflow.com/questions/41275539/lookup-in-spark-dataframes – Young Nov 16 '22 at 13:15

1 Answers1

0

You can do it by first doing left-join and then using simple when-otherwise construct, like this:

from pyspark.sql.functions import col, when

dfA = spark.createDataFrame([("John Smith", 93818, "Bad Math"), ("Jane Doe", 91982, "Invalid Row")], "Name: string, TransactionID: int, Alert_flag: string")
dfB = spark.createDataFrame([("Bad Math", "Incorrect Calculations"), ("Invalid Row", "Invalid Transaction")], "Alert_flag_OLD: string, Alert_flag_NEW: string")

dfC = dfA.join(dfB, col("Alert_flag") == col("Alert_flag_OLD"), "left")
dfC.withColumn("Alert_flag", when(col("Alert_flag_NEW").isNotNull(), col("Alert_flag_NEW"))).select(["Name", "TransactionID", "Alert_flag"]).show()

+----------+-------------+----------------------+
|Name      |TransactionID|Alert_flag            |
+----------+-------------+----------------------+
|John Smith|93818        |Incorrect Calculations|
|Jane Doe  |91982        |Invalid Transaction   |
+----------+-------------+----------------------+
Bartosz Gajda
  • 984
  • 6
  • 14