How to apply multiple regex patterns in pyspark dataframe

Question

I'm trying to implement a udf which takes input Dataframe and column name every record in the input column has to parse through all the regular expression pattern. I'm new to pyspark.

import re
from pyspark.sql.functions import udf

def group_nm_transfrom(inDF,column_name):
    column_name = re.findall(r's/ AND /  /', column_name)
    column_name = re.findall(r's/ ADVANCED | ADVANCE / ADV /', olumn_name)
    column_name = re.findall(r's/ ASC | ASSOCI | ASSC | ASSOCIAT | ASSOCIA | ASSO | ASSOCS | AS | ASSOCIATES / ASSOC /', column_name)

    return matches[0] if matches else None

You don't need to use `udf` for this- you can use [`pyspark.sql.Column.rlike()`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.Column.rlike) and [`pyspark.sql.functions.regexp_extract()`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.regexp_extract) — pault, Apr 30 '18 at 18:16
Please provide some sample input/desired output. More on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). — pault, Apr 30 '18 at 18:21
Possible duplicate of [I have an issue with regex extract with multiple matches](https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches) — Wiktor Stribiżew, Sep 24 '19 at 07:23

How to apply multiple regex patterns in pyspark dataframe

0 Answers0

Linked