I have a column in data frame named as "tags". I need to extract the values based on the condition. The condition is it should not contains _(Underscore) and :(Colon).
For example:
"tags": "hai, hello, amount_10, amount_90, total:100"
Expected result:
"new_column" : "hai, hello"
For your information:
I extracted all the amount tags by
collectAmount = udf(lambda s: list(map(lambda amount: amount.split('_')[1] if len(collection) > 0
else amount, re.findall(r'(amount_\w+)', s))), ArrayType(StringType()))
productsDF = productsDF.withColumn('amount_tag', collectAmount('tags'))