2

I want to extract userhandles from retweets, ie; any username between "RT @username:xyzxyzxyz" to a new column. I did the following

df = pd.read_csv("string.csv")
for index,row in df.iterrows(): 
    df['Influencers'] = df['Tweet'].str.extract("\(@*?)\:")
df.to_csv('string3.csv', index=False)

It generated following error :

  File "C:\ANACONDA\lib\re.py", line 251, in _compile
    raise error, v # invalid expression

error: unbalanced parenthesis

Sample DF:

df=pd.DataFrame({"Tweet": ["RT @saikatd: Are editors involved in the transfer of Income Tax officials?","RT @CLManojET: Can't allow L-G's fantasy of running a parallel administration"," Fairplay n equity 2 consumers 2 be ensured"]})
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
lightyagami96
  • 336
  • 1
  • 4
  • 14

2 Answers2

2

Try this:

df = pd.read_csv("string.csv")
df['Influencers'] = df['Tweet'].str.extract("RT\s+(\@[^\:]*)", expand=False)

UPDATE:

In [34]: df
Out[34]:
                        Tweet
0      RT @username:xyzxyzxyz
1         Free text RT @user2
2                 Blah - blah
3  Text @another_user:aaaaaaa

In [35]: df['Influencers'] = df['Tweet'].str.extract("RT\s+(\@[^\:]*)", expand=False).fillna('Original')

In [36]: df
Out[36]:
                        Tweet Influencers
0      RT @username:xyzxyzxyz   @username
1         Free text RT @user2      @user2
2                 Blah - blah    Original
3  Text @another_user:aaaaaaa    Original
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • @lightyagami96, please [extend your question](https://stackoverflow.com/posts/44444189/edit) with a small reproducible data set and desired data set. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Jun 08 '17 at 19:43
  • thanks for suggestion, `df=pd.DataFrame({"Tweet": ["RT @saikatd: Are editors involved in the transfer of Income Tax officials?","RT @CLManojET: Can't allow L-G's fantasy of running a parallel administration"," Fairplay n equity 2 consumers 2 be ensured"]})` – lightyagami96 Jun 08 '17 at 19:52
  • @lightyagami96 feel free to up vote MaxU's answer as well. You now have >= 15 rep – piRSquared Jun 08 '17 at 19:55
  • @lightyagami96, glad i could help :) – MaxU - stand with Ukraine Jun 08 '17 at 20:05
0

Sorry, I solved this one but I'm unable to implement the else condition for the above situation :

df = pd.read_csv("string.csv")
for index,row in df.iterrows(): 
    if "RT @" in row["Tweet"]: 
        df['Influencers'] = "@"+df['Tweet'].str.extract("\@(.+?)\:", expand= False)
    else :
        df['Influencers'] = "Original"
df.to_csv('string3.csv', index=False)

Its producing blank rows for the else condition.

lightyagami96
  • 336
  • 1
  • 4
  • 14