0

I'm trying to implement a udf which takes input Dataframe and column name every record in the input column has to parse through all the regular expression pattern. I'm new to pyspark.

import re
from pyspark.sql.functions import udf

def group_nm_transfrom(inDF,column_name):
    column_name = re.findall(r's/ AND /  /', column_name)
    column_name = re.findall(r's/ ADVANCED | ADVANCE / ADV /', olumn_name)
    column_name = re.findall(r's/ ASC | ASSOCI | ASSC | ASSOCIAT | ASSOCIA | ASSO | ASSOCS | AS | ASSOCIATES / ASSOC /', column_name)

    return matches[0] if matches else None
James Z
  • 12,209
  • 10
  • 24
  • 44
marjun
  • 696
  • 5
  • 17
  • 30
  • You don't need to use `udf` for this- you can use [`pyspark.sql.Column.rlike()`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.Column.rlike) and [`pyspark.sql.functions.regexp_extract()`](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.regexp_extract) – pault Apr 30 '18 at 18:16
  • 1
    Please provide some sample input/desired output. More on [how to create good reproducible apache spark dataframe examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-dataframe-examples). – pault Apr 30 '18 at 18:21
  • 2
    Possible duplicate of [I have an issue with regex extract with multiple matches](https://stackoverflow.com/questions/54597183/i-have-an-issue-with-regex-extract-with-multiple-matches) – Wiktor Stribiżew Sep 24 '19 at 07:23

0 Answers0