1

I have a pandas dataframe consisting of chat bot data and I want to print the User's input in response to a specific chat bot message eg:

(Row 1) Bot: Hi, What are your hobbies?
(Row 2) User1: Cricket, FootBall
(Row 3) Bot: Hi, What is your name?
(Row 4) User2: Alexa
(Row 5) Bot: Hi, What are your hobbies?
(Row 6) User3: Tennis, Baseball

So basically I have a dataframe with 6 rows and 1 column as above and I want to print the user's input to the specific question "Hi, What are your hobbies?" only.

I tried the following code which prints the Bot's question but I am unable to find a way to get the User's answer to that specific question.

for i in Chat_Column:
    if i =="Bot: Hi, What are your hobbies?":        
        print (i);

Basically the output I want in this case is:

User1: Cricket, FootBall
User3: Tennis, Baseball
Charuක
  • 12,953
  • 5
  • 50
  • 88
Inherited Geek
  • 2,283
  • 2
  • 19
  • 26
  • If you're going to work with pandas DataFrames, you should stick with the indexing options native to pandas as opposed to native python iterations. They are much faster and while maybe not as intuitive as python (especially if you're not used to them) actually are a big time-saver in the long-run. – elPastor Dec 23 '16 at 18:28

3 Answers3

3

You should first get the index of the row that matches the question by index function of DataFrame. To get partial match to your question, use str.contains.

df = pd.DataFrame({'data':
               ["(Row 1) Bot: Hi, What are your hobbies?",
                "(Row 2) User1: Cricket, FootBall",                               
                "(Row 3) Bot: Hi, What is your name?",
                "(Row 4) User2: Alexa",
                "(Row 5) Bot: Hi, What are your hobbies?",
                "(Row 6) User3: Tennis, Baseball"]
               })

idx = df[df['data'].str.contains("Hi, What are your hobbies?")].index.tolist()
for i in idx:
  if i < len(df) - 1:
    print(df.iloc[i + 1].values[0])

Output:

(Row 2) User1: Cricket, FootBall
(Row 6) User3: Tennis, Baseball

So, in the above code, idx holds a list of indexes that match your query. In the last line, you print the values of the next row corresponding those indices.

CentAu
  • 10,660
  • 15
  • 59
  • 85
  • AttributeError: 'NoneType' object has no attribute 'values' – Inherited Geek Dec 23 '16 at 16:48
  • @InheritedGeek what python, pandas version are you using? Copying and pasting the above code works fine with python 2.7.11, pandas 0.18.1. Checked with python 3+, there was a print issue. Now fixed and should work fine. – CentAu Dec 23 '16 at 16:49
  • it's working now I was using Python 3 hence the error, thanks! – Inherited Geek Dec 23 '16 at 16:54
0

Using the Data Frame declaration from @CentAu

import pandas as pd

Chat_Column = pd.DataFrame({'data':
               ["(Row 1) Bot: Hi, What are your hobbies?",
                "(Row 2) User1: Cricket, FootBall",
                "(Row 3) Bot: Hi, What is your name?",
                "(Row 4) User2: Alexa",
                "(Row 5) Bot: Hi, What are your hobbies?",
                "(Row 6) User3: Tennis, Baseball"]
               })

for ndx, row in Chat_Column.iterrows():
    if "Bot: Hi, What are your hobbies?" in row["data"]:
        print(Chat_Column.iloc[ndx+1]["data"])
somesingsomsing
  • 3,182
  • 4
  • 29
  • 46
0
d = {'a':['Question1','Answer1','Question2','Answer2','Question3','Answer3']}
df = pd.DataFrame(d)
print df['a'].shift(-1)[df['a'] == 'Question2'].values[0]

But I would recommend putting the dataframe into two columns (Quesiton and Answer).

Point of note - using pandas native indexing will be considerably faster than iterating through a list of strings and testing if statements on every iteration. That said, it is probably irrelevant if it is only a couple hundred / thousand lines, but could be obvious if the DataFrame increases in size.

elPastor
  • 8,435
  • 11
  • 53
  • 81