I got a set of Twitter status updates and im trying to filter all direct messages, senders and receivers of the latter. My dataframe includes columns for senders and text. Using regular expressions I'm trying filter the receivers out of the text column.
this ist what I got, but its returning some strange results
WD <- getwd()
if (!is.null(WD)) setwd(WD)
load("data.R")
#http://www.unet.univie.ac.at/~a0406222/data.R
dmtext <- grep("^@[a-z0-9_]{1,15}", tweets$text, perl=T, value=TRUE,ignore.case=TRUE)
dm.receiver <- gsub("^@([a-z0-9_]{1,15})[ :,].*$", "\\1", dmtext, perl=T,ignore.case=TRUE)
dm.sender <- as.character(tweets$from_user[grep("^@[a-z0-9_]{1,15}", tweets$text, perl=T,ignore.case=TRUE,value=FALSE)])
dm.df <- data.frame(dm.sender,dm.receiver,dmtext)
dm.df[1:1000,2]
these are some examples of the bad results I get for dm.receiver
@insultaofuturo Apesar da proibição, jovens insistem em acampar no Aterro na Rio+20\nhttp://t.co/dCfFHUWV
@mqtodd Bringing the .green Internet to Rio+20 Summit | DotGreen\nhttp://t.co/pQqYilXp #RioPlus20 #gogreen
@Shyman33 Elinor Ostrom's trailblazing commons research can inspire Rio+20\n http://t.co/m7OTHBtP
@OccupyRio20 @pnud_es @FBuenAbad @rioplussocial #Futurewewant \nALGO DE ESTO SE HA CUMPLIDO? http://t.co/QDlVwT5z
@UNDP_CDG#UNDP#Asia-Pacific#Rio+20E-discussion on National&Local Planning for Sustainable Development. Contribute&mail:aprc.rio20@undp.org
why is it that I get results longer than 15 characters using {1,15}?