I'm new to Python and PySpark. I have a dataframe in PySpark like the following:
## +---+---+------+
## | x1| x2| x3 |
## +---+---+------+
## | 0| a | 13.0|
## | 2| B | -33.0|
## | 1| B | -63.0|
## +---+---+------+
I have an array: arr = [10, 12, 13]
I want to create a column x4 in the dataframe such that it should have the corresponding values from the list based on the values of x1 as indices. The final dataset should look like:
## +---+---+------+-----+
## | x1| x2| x3 | x4 |
## +---+---+------+-----+
## | 0| a | 13.0| 10 |
## | 2| B | -33.0| 13 |
## | 1| B | -63.0| 12 |
## +---+---+------+-----+
I have tried using the following code to achieve so:
df.withColumn("x4", lit(arr[col('x1')])).show()
However, I am getting an error:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Is there any way I can achieve this efficiently?