My question is - in pyspark dataframe in "rlike" function how to pass the string value row by row from one of dataframe column
Got the error meesagge while run
df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)
Py4JError: An error occurred while calling o2165.rlike. Trace:
py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist
Do you know any workaround or solution ?
df = spark.createDataFrame([
(1, 'test1 test1_0|test1 test0', 'This is a test1 test1_0'),
(2, 'test2 test2_0|test1 test0', None),
(3, 'Nan', 5.2, 23, 'Nan'),
(4, 'test4 test4_0|test1 test0', 'This is a test4 test4_0'),
], ['id', 'match', 'text1'])
+---+-------------------------+-----------------------+
|id |match |text1 |
+---+-------------------------+-----------------------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|
|2 |test2 test2_0|test1 test0|null |
|3 |Nan |Nan |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|
+---+-------------------------+-----------------------+
root
|-- id: long (nullable = true)
|-- match: string (nullable = true)
|-- text1: string (nullable = true)
df.withColumn("match_str", df.text1.rlike(df.select(df.match).head()["match"])).show(truncate=False)
Note : df.select(df.match).head()["match"]
passing value first row match in this case matching "test1 test1_0|test1 test0"
to all rows. I want to pass the rlike value row by row. like
- id '1' match 'test1 test1_0|test1 test0' with "This is a test1 test1_0"
- id '2' match 'test2 test2_0|test1 test0' with "None"
etc.
+---+-------------------------+-----------------------+---------+
|id |match |text1 |match_str|
+---+-------------------------+-----------------------+---------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|true |
|2 |test2 test2_0|test1 test0|null |null |
|3 |Nan |Nan |false |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|false |
+---+-------------------------+-----------------------+---------+
df.withColumn("match_str", df.text1.rlike(df.match)).show(truncate=False)
Py4JError: An error occurred while calling o2165.rlike. Trace:
py4j.Py4JException: Method rlike([class org.apache.spark.sql.Column]) does not exist
expected results :
+---+-------------------------+-----------------------+---------+
|id |match |text1 |match_str|
+---+-------------------------+-----------------------+---------+
|1 |test1 test1_0|test1 test0|This is a test1 test1_0|true |
|2 |test2 test2_0|test1 test0|null |false |
|3 |Nan |Nan |true |
|4 |test4 test4_0|test1 test0|This is a test4 test4_0|true |
+---+-------------------------+-----------------------+---------+