0

I have a string of text and I want to find the nth word from it. I am able to extract the first and last via min and max but do not know how to get the items inbetween.

My code:

import pandas as pd
import numpy as np
data = {"Text" : ["['one', 'one two', 'four']","['two one', 'three', 'five]"]}
df = pd.DataFrame(data)

df["One"] = df["Text"].str.find("one")
df["Two"] = df["Text"].str.find("two")
df["Three"] = df["Text"].str.find("three")
df["Four"] = df["Text"].str.find("four")
df["Five"] = df["Text"].str.find("five")

score_words = df.loc[:,"One":"Five"]
score_words_dict = dict(
    list(
        score_words.groupby(score_words.index)
    )
)

score_words = score_words[score_words >0]

df["AllScoreWords"] =""
for k, v in score_words_dict.items():               # k: name of index, v: is a df
    df["AllScoreWords"][k] = str(v.columns[(v != -1).any()].to_list())

df['First_Score'] = score_words.idxmin(axis=1)
df['Last_Score'] = score_words.idxmax(axis=1)

print(df)
print(score_words)

So in the first line, I want to be able to extract Two as the second scoreword & in the second line I want to extract One as the second scoreword..... etc etc.

In reality I have a key word immediately after which or before which I want to pull the the words said, so simply increasing the threshold of scorewords doesn't work.

How can I pick out the words I want?
J

James Oliver
  • 547
  • 1
  • 4
  • 17

1 Answers1

0

Answer was found stripping away the elements using replace like so:

scorewords_table["Clean_ScoreWords"] =scorewords_table.AllScoreWords.str.replace("[","")

This answer then showed me how to get there:

df['V'] = df['V'].str.split('-').str[0]
James Oliver
  • 547
  • 1
  • 4
  • 17