-1

I am new to R. Currently, I have parsed messages from a Whatsapp chat group and now I am trying to visualize data for average word length per member.

I am using this code to calculate the number of words for every time "Eddy" message for(i in grep("Eddy",chatcsv[,2],fixed=TRUE)){ length(which(!is.na(chatcsv[i,4:111]))) }

This does not return any output or any error message.

My intention is to then sum up the total length and then divide by the number of times a person message. Lastly, I plan to place the average as a vector and visualize it as a bar graph.

Thank you

  • Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Nov 04 '18 at 03:21

2 Answers2

0

Your syntax is wrong. You should use:

allnames <- chatcsv[,2] #or cimilar
eddyindexes <- grep("Eddy",allnames,fixed=TRUE) #return indexes of eddys chats
eddyschats <- chatcsv[eddyindexes, 4:100]
eddysavgcharacters <- apply[eddyschats,function(x) mean(nchar(x))] #average nchars of eddys chats
gaut
  • 5,771
  • 1
  • 14
  • 45
  • Thank you. So am i right to say that when using grep, you cannot use a subset the character vector argument? – Amir Khan Nov 04 '18 at 09:51
  • If the answer is ok please consider accepting it. Grep return True or False, so you should use which() to get only true values, the other problem is that youre doing nothing in your loop other than calling length without assigning it to any variable. You should do so or if you just want to print that length use print. – gaut Nov 04 '18 at 10:23
0

I'm thinking you are coming from a non-functional language. (Not a language that is dysfunctional, but rather one that is not a "functional language".) Your expression length(which(!is.na(chatcsv[i,4:111]))) would do nothing, because it is inside a for loop but was not assigned to any name. It just disappears. You would have needed to create a named vector (let's say res) with res <-numeric(0) before your loop and then within your loop done:

 res[i] <-  length(which(!is.na(chatcsv[i,4:111])))

The earlier answerer was confusing grep and grepl in his comment. The grep function returns integer values; the grepl function returns logical vectors. They can both be used for indexing.

Whether that expression would give you the basis for furhter efforts is no clear. It would depend on the contents of chatcsv[i,4:111]. If the contents are single words then perhaps it would succeed. If they are sentences then it would not. The length function would just return the number of non-NA values in the row-vector. Only if your prior (undescribed) operations had created a clean set of "words" in that set of columns would you be getting meaningful results.

IRTFM
  • 258,963
  • 21
  • 364
  • 487