I have a data frame which looks like this:
Filename Type
file1.A.txt
file2.A.txt
file3.B.txt
file4.A.txt
file5.B.txt
...
I want to add another column, Type
, which will depend on the filename. If there is an A in the filename, add A
, if there is a B
, add B
.
I've seen something vaguely similar to this in Add column to Data Frame conditionally in Pyspark but I can't see how I can apply this in my case.
I can add constants to Spark by df = df.withColumn('NewCol', lit('a'))
but how can I alter this expression, using regex, to add a certain string in some cases, and another string in other cases?
This is similar to the linked question Spark Equivalent of IF Then ELSE but Michael West's answer is easier to type out and more specific to the problem. However, I think it could still solve the problem (though would be more difficult to read).