Hi I have dataframe with 2 columns :
+----------------------------------------+----------+
| Text | Key_word |
+----------------------------------------+----------+
| First random text tree cheese cat | tree |
| Second random text apple pie three | text |
| Third random text burger food brain | brain |
| Fourth random text nothing thing chips | random |
+----------------------------------------+----------+
I want to generate a 3rd columns with a word appearing before the key_word from the text.
+----------------------------------------+----------+-------------------+--+
| Text | Key_word | word_bef_key_word | |
+----------------------------------------+----------+-------------------+--+
| First random text tree cheese cat | tree | text | |
| Second random text apple pie three | text | random | |
| Third random text burger food brain | brain | food | |
| Fourth random text nothing thing chips | random | Fourth | |
+----------------------------------------+----------+-------------------+--+
I tried this but it's not working
df2=df1.withColumn('word_bef_key_word',regexp_extract(df1.Text,('\\w+)'df1.key_word,1))
Here is the code to create a example of the dataframe
df = sqlCtx.createDataFrame(
[
('First random text tree cheese cat' , 'tree'),
('Second random text apple pie three', 'text'),
('Third random text burger food brain' , 'brain'),
('Fourth random text nothing thing chips', 'random')
],
('Text', 'Key_word')
)