I have a Spark scala DataFrame with two columns, text and subtext, where subtext is guaranteed to occur somewhere within text. How would I calculate the position of subtext in text column?
Input data:
+---------------------------+---------+
| text | subtext |
+---------------------------+---------+
| Where is my string? | is |
| Hm, this one is different | on |
+---------------------------+---------+
Expected output:
+---------------------------+---------+----------+
| text | subtext | position |
+---------------------------+---------+----------+
| Where is my string? | is | 6 |
| Hm, this one is different | on | 9 |
+---------------------------+---------+----------+
Note: I can do this using static text/regex without issue, I have not been able to find any resources on doing this with a row-specific text/regex. Found an answer here that works with pyspark. I am looking to use similar solution in scala. How to find position of substring column in a another column using PySpark?