I have a string of text and I want to find the nth word from it. I am able to extract the first and last via min and max but do not know how to get the items inbetween.
My code:
import pandas as pd
import numpy as np
data = {"Text" : ["['one', 'one two', 'four']","['two one', 'three', 'five]"]}
df = pd.DataFrame(data)
df["One"] = df["Text"].str.find("one")
df["Two"] = df["Text"].str.find("two")
df["Three"] = df["Text"].str.find("three")
df["Four"] = df["Text"].str.find("four")
df["Five"] = df["Text"].str.find("five")
score_words = df.loc[:,"One":"Five"]
score_words_dict = dict(
list(
score_words.groupby(score_words.index)
)
)
score_words = score_words[score_words >0]
df["AllScoreWords"] =""
for k, v in score_words_dict.items(): # k: name of index, v: is a df
df["AllScoreWords"][k] = str(v.columns[(v != -1).any()].to_list())
df['First_Score'] = score_words.idxmin(axis=1)
df['Last_Score'] = score_words.idxmax(axis=1)
print(df)
print(score_words)
So in the first line, I want to be able to extract Two as the second scoreword & in the second line I want to extract One as the second scoreword..... etc etc.
In reality I have a key word immediately after which or before which I want to pull the the words said, so simply increasing the threshold of scorewords doesn't work.
How can I pick out the words I want?
J