Is it possible to add a new column based on the maximum of previous columns where the previous columns are string literals. Consider following dataframe:
df = spark.createDataFrame(
[
('1',25000,"black","black","white"),
('2',16000,"red","black","white"),
],
['ID','cash','colour_body','colour_head','colour_foot']
)
Then the target frame should look like this:
df = spark.createDataFrame(
[
('1',25000,"black","black","white", "black" ),
('2',16000,"red","black","white", "white" ),
],
['ID','cash','colour_body','colour_head','colour_foot', 'max_v']
)
If there is no maximum detectable, then the last valid colour should be used.
Is there some kind of counter possibility available or udf?