If I have a PySpark DataFrame with two columns, text
and subtext
, where subtext
is guaranteed to occur somewhere within text
. How would I calculate the position of subtext
in text
column?
Input data:
+---------------------------+---------+
| text | subtext |
+---------------------------+---------+
| Where is my string? | is |
| Hm, this one is different | on |
+---------------------------+---------+
Expected output:
+---------------------------+---------+----------+
| text | subtext | position |
+---------------------------+---------+----------+
| Where is my string? | is | 6 |
| Hm, this one is different | on | 9 |
+---------------------------+---------+----------+
Note: I can do this using static text/regex without issue, I have not been able to find any resources on doing this with a row-specific text/regex.