0

I have got the data from twitter in JSON format with multiple column. I am working with one of them and trying to get username of mentionedUsers and get them into a separete column.

print(tweets_data['mentionedUsers'])

0        [{'username': 'HuntTerrorist', 'displayname': ...
1        [{'username': 'AttorneyCrump', 'displayname': ...
2                                                     None
3        [{'username': 'realDonaldTrump', 'displayname'...
4                                                     None
                               ...                        
19995                                                 None
19996                                                 None
19997                                                 None
19998                                                 None
19999                                                 None
Name: mentionedUsers, Length: 20000, dtype: object

I have tried this code:

mentioned_users = []


for i in range(len(tweets_data)):
    if tweets_data['mentionedUsers'][i]['username'] is not None:
        mentioned_users.append(tweets_data['mentionedUsers'][i]['username'])
    else:
        mentioned_users.append(None)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-64-cc453018d33d> in <module>
      3 
      4 for i in range(len(tweets_data)):
----> 5     if tweets_data['mentionedUsers'][i]['username'] is not None:
      6         mentioned_users.append(tweets_data['mentionedUsers'][i]['username'])
      7     else:

TypeError: list indices must be integers or slices, not str

Could anyone please tell me what's wrong with this? I believe the problem is in the [].If so how do I extract the data from list? Thank you for help!

Anna
  • 11
  • 4
  • Error is saying that list indices must be integer not str, so check the type of the data you are having, it can be a whole string inside tweets_data[mentionedUsers], if it is the case then typecast the same. The code seems to be correct. – Tanmay Shrivastava Dec 26 '20 at 21:50

1 Answers1

0

The easier way would be to explode the list and then use df['col_name'].apply(pd.Series)

Assuming you have converted your son data and stored it in a dataframe df.


exploded_df = df.explode('mentionedUsers')

user_df = exploded_df['mentionedUsers'].apply(pd.Series)

Rajat Mishra
  • 3,635
  • 4
  • 27
  • 41
  • I want only to get usernames out of this. When I use explode it shows usernames in separated columns. I want them to stay in the same row. – Anna Dec 30 '20 at 12:59