-1

I have a column named subcat_id in which the values are stored as comma separated lists. I need to count the number of values and store the counts in a new column. The lists also have Null values that I want to get rid of.

Example

I would like to store the counts in the n column.

G5W
  • 36,531
  • 10
  • 47
  • 80
Sandy2511
  • 43
  • 2
  • 7
  • 1
    Do not post your data as an image, please learn how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Feb 08 '16 at 08:25

2 Answers2

4

We can try

 nchar(gsub('[^,]+', '', gsub(',(?=,)|(^,|,$)', '', 
      gsub('(Null){1,}', '', df1$subcat_id), perl=TRUE)))+1L
 #[1] 6 4

Or

library(stringr)
str_count(df1$subcat_id, '[0-9.]+')
#[1] 6 4

data

 df1 <- data.frame(subcat_id = c('1,2,3,15,16,78', 
        '1,2,3,15,Null,Null'), stringsAsFactors=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can do

sapply(strsplit(subcat_id,","),FUN=function(x){length(x[x!="Null"])})

strsplit(subcat_id,",") will return a list of each item in subcat_id split on commas. sapply will apply the specified function to each item in this list and return us a vector of the results.

Finally, the function that we apply will take just the non-null entries in each list item and count the resulting sublist.

For example, if we have

subcat_id <- c("1,2,3","23,Null,4")

Then running the above code returns c(3,4) which you can assign to your column.


If running this from a dataframe, it is possible that the character column has been interpreted as a factor, in which case the error non-character argument will be thrown. To fix this, we need to force interpretation as a character vector with the as.character function, changing the command to
sapply(strsplit(as.character(frame$subcat_id),","),FUN=function(x){length(x[x!="Null"])})
Matthew
  • 7,440
  • 1
  • 24
  • 49
  • Error in strsplit(test2$subcat_id, ",") : non-character argument using your commands am getting the error – Sandy2511 Feb 08 '16 at 09:06
  • @Sandy2511 I have edited the answer to address that (which is caused by the dataframe treating the column as a factor). See the note at the end of my answer. – Matthew Feb 08 '16 at 09:31
  • excellent and thanks for the help ,you explanation to the task was wonderfull – Sandy2511 Feb 08 '16 at 12:28
  • @Sandy2511 You're welcome and I'm glad that I could help. If this answer solved your problem, please consider accepting it. – Matthew Feb 08 '16 at 20:49