Sample Data:
https://www.test.com/document/101?ref=stringA&qid=e9f7a92b
https://www.test.com/document/102?ref=stringB
Regex: ref=([a-zA-Z0-9]+\.?)/?
Match1 is stringA
and stringB
Doing the same in pyspark
spark.sql("select regexp_extract(url, 'ref=([a-zA-Z0-9]+\.?)/?', 1) c1 from my_table").show()
Output:
+---------------+
|c1 |
+---------------+
|stringA& |
|stringB& |
+---------------+