0

I have a dataframe containing tweets that looks like this : enter image description here

What I am trying to do is take the text from the rows where the column 'in_reply_to_user_id'(not in the picture because the df is too wide to fit) has the same value as a given id and append the text to a list which i then want to put in a new column. As an example, the text from all the tweets where the column 'in_reply_to_user_id' is equal to the 'id' of the first tweet should be put in a list that is then appended to a new column in the dataframe called 'replies'. Here are some of the things i tried :

for i in testb['in_reply_to_user_id']:
   for j in test['user.id']:
       if i == j:
           index=testb.index()
           test['replies'].append(testb['text'].iloc[index]) ```

test would be the original dataframe and testb would be a copy that i created in order to try to run the code above. it is just a copy of test.

2 Answers2

0

Assuming that the original Dataframe looks like this:

         text              user_id   reply_to        
0   this is reply to 3       1         3         
1   this is reply to 3       2         3         
2   this is reply to 2       3         2         
3   this is reply to 2       4         2               
4   this is reply to 1       5         1               

Then by using df.loc() we can get the records that contain replies to each text:

import pandas as pd

data = [['this is reply to 3', 1, 3], ['this is reply to 3', 2, 3],['this is 
reply to 2', 3, 2],['this is reply to 2', 4, 2], ['this is reply to 1', 5,1 ]]

df = pd.DataFrame(data, columns = ['text', 'user_id', 'reply_to']) 

replies = []

for user_id in df.user_id:
    text = df.loc[df['reply_to'] == user_id].text.values
    replies.append(text)

df['replies'] = replies

The resulting Dataframe looks like this:

         text              user_id   reply_to         replies
0   this is reply to 3       1         3         [this is reply to 1]
1   this is reply to 3       2         3         [this is reply to 2, this is reply to 2]
2   this is reply to 2       3         2         [this is reply to 3, this is reply to 3]
3   this is reply to 2       4         2               []
4   this is reply to 1       5         1               []
  • Hello @AmrSherbiny! Please directly paste your dataframe inside your answer, so this will be easier for the community to read or copy/paste it. – toshiro92 May 25 '20 at 14:24
0

Here's a straightforward solution, looping over all the rows.

import numpy as np
import pandas as pd

# example data
df = pd.DataFrame({'id': [1, 2, 3, 4],
                   'text': ['How are you?', 'Fine.', 'Okay.', 'hi'], 
                   'in_reply_to_user_id': [4, 1, 1, 3]})

# initiate new column
df['replies'] = np.repeat(None, len(df))

# assign lists as described in the question
for i in df.index:
    df.at[i, 'replies'] = list(df.text[df.in_reply_to_user_id == df.id[i]])

# show results
df
    id  text            in_reply_to_user_id     replies
0   1   How are you?    4                       [Fine., Okay.]
1   2   Fine.           1                       []
2   3   Okay.           1                       [hi]
3   4   hi              3                       [How are you?]
Arne
  • 9,990
  • 2
  • 18
  • 28
  • Using the code you provided gives me the error : ‘BlockManager’ object has no attribute t – Luca Marinescu May 25 '20 at 14:29
  • @Luca Marinescu Hmm, can you be more specific? Which line causes the error? What is the traceback? Can you narrow down what part of your data causes the error? – Arne May 25 '20 at 15:13