Assess each row of a factor in R

Question

I have a factor with 1000 rows and 848 levels (i.e. some rows are empty). For each row, I want to count the number of elements (i.e., one element = 1, 2 elements = 2, empty row = 0, etc.). A simpler way to describe it is: I want to convert a factor into a data.frame, but I want to change the data type from factor to numeric and keep the values in each row.

v.m.two <- Output[,1]
v.m.two <- data.frame(v.m.two)
class(v.m.two)
[1] data.frame
class(v.m.two[1,]
[1] factor
dim(v.m.two)
[1] 1000 1
v.m.two[1,]
[1] 848 Levels: 0 1000 1002, 4875, 4082, 1952 1015, 2570, 3524 1017 1020, 1576 ... 983, 4381,
2256, 4361, 4271

Any suggestions?

           v.m.two
1       2633, 4868
2        126, 4860
3                0
4        122, 4762
5             4256
6 2933, 2892, 2389

Basically, I want to count the values in each row (e.g., row 1 is 2, row 2 is 2, row 3 is 0, etc.).

Can you show a few lines of `v.m.two`? You also might want to use `v.m.two <- data.frame(v.m.two, stringsAsFactors = FALSE)` — Rich Scriven, Nov 02 '14 at 07:37
What do you mean by "count the number of elements"? Each row only has a single value. Do you just want `as.numeric(as.character(v.m.two[, 1]))`? — jbaums, Nov 02 '14 at 07:42
Hey Richard, I made the edits in the original post. I've already tried that, but it's not working the way it was intended. I'm unsure why >.> — , Nov 02 '14 at 07:43
Ok, maybe you want `sapply(strsplit(as.character(v.m.two[, 1]), ','), length)` — jbaums, Nov 02 '14 at 07:46
Hey jbaums, since the data type is a factor, you're correct in that each row contains a single value. But within each row, there contains a list of elements separated by a "," and I want to count these values. — , Nov 02 '14 at 07:46
By the way, if you want people to know that you've mentioned them in a comment, then you need to use, e.g. @jbaums (unless it's their post, in which case they get notified of all comments). — jbaums, Nov 02 '14 at 07:48
@jbaums, thanks for the two advice! Your code almost worked, except that the 0's are considered as numbers, instead of an empty row. — , Nov 02 '14 at 07:53
@user2105555 What is the result you really wanted? DO you want a data.frame with `ncol` equal to the maximum number of element in each row of the original dataset `v.m.two` or just a vector of values ? — akrun, Nov 02 '14 at 08:17
@akrun The result I really wanted is the output of a vector of values from running a simulation of 1000 iterations, then use the values to plot onto a graph. haha. But the entire time I was stuck on manipulating the data types to perform the calculations I wanted. I figured everything else out but not the solution to the novice question as per the original post. >. — , Nov 02 '14 at 08:23
@user2105555 Thanks and sorry I misunderstood while reading the post. — akrun, Nov 02 '14 at 08:24

Rich Scriven · Answer 1 · 2014-11-02T10:20:46.443

1

You have erroneous commas which is causing the factors. Try scan

scan(text=with(v.m.two, levels(v.m.two)[v.m.two]), sep=",", what=integer())
# Read 11 items
# [1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

And to count the lengths and convert to numeric, you can also use strsplit

s <- strsplit(as.character(v.m.two[[1]]), ", ")
vapply(s, length, integer(1L)) ## row 3 is actually 1 if there's a zero there
# [1] 2 2 1 2 1 3
as.numeric(do.call(c, s))
# [1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

edited Nov 02 '14 at 10:20

answered Nov 02 '14 at 07:52

Rich Scriven

97,041
11
181
245

I received this error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :scan() expected 'an integer', got '2633,' – Nov 02 '14 at 07:56
Because I used spaces and not commas, about to fix – Rich Scriven Nov 02 '14 at 08:00

score 0 · Accepted Answer · edited May 23 '17 at 12:12

1 Converting factor to numeric

If you want to convert the factor columns to numeric and want to have separate columns based on the number of elements in each row.

 library(splitstackshape)
 res <- cSplit(v.m.two, 'v.m.two', sep=",")
 res
 #    v.m.two_1 v.m.two_2 v.m.two_3
 #1:      2633      4868        NA
 #2:       126      4860        NA
 #3:         0        NA        NA
 #4:       122      4762        NA
 #5:      4256        NA        NA
 #6:      2933      2892      2389

  str(res)
  #Classes ‘data.table’ and 'data.frame':   6 obs. of  3 variables:
  #$ v.m.two_1: int  2633 126 0 122 4256 2933
  # $ v.m.two_2: int  4868 4860 NA 4762 NA 2892
  #$ v.m.two_3: int  NA NA NA NA NA 2389

If you need a vector, you could use stri_split from stringi

  library(stringi)
  as.numeric(unlist(stri_split(v.m.two[,1], regex=",")))
  #[1] 2633 4868  126 4860    0  122 4762 4256 2933 2892 2389

2. Counting values in row

For counting the values in each row of v.m.two, you could either count from the res above or from v.m.two. In the first option, we are counting the number of NAs in each row of res and then multiplying with the logical index derived from whether the first column of v.m.two is 0 or not. The TRUE values i.e. !=0 will get the count while the FALSE will coerce to 0 ie. 0 * value=0
```
  (v.m.two[,1]!=0)*(rowSums(!is.na(res)))
  #[1] 2 2 0 2 1 3    
```
You could use stri_count from stringi which would be fast (counting occurrence of particular letter in vector of words in r). Here as above, you can either use the arithmetic i.e. multiplying or could use ifelse. The regex can be based on digits or ,. If you are using ,, then make sure to add 1.
```
  ifelse(v.m.two[,1]=0, stri_count(v.m.two[,1], regex="\\d+"), 0)
  # [1] 2 2 0 2 1 3
  #Or

  (v.m.two[,1]!=0) *stri_count(v.m.two[,1], regex="\\d+")
  #[1] 2 2 0 2 1 3
  #Or   
  (v.m.two[,1]!=0) *(stri_count(v.m.two[,1], regex=",") +1)
  #[1] 2 2 0 2 1 3
```

Another option to count would be to use gsub and nchar from base R.

  (v.m.two[,1]!=0) *( nchar(gsub("[^,]", "", v.m.two[,1]))+1)
  #[1] 2 2 0 2 1 3

data

v.m.two <- structure(list(v.m.two = structure(c(4L, 3L, 1L, 2L, 6L, 5L), 
.Label = c("0", "122, 4762", "126, 4860", "2633, 4868", "2933, 2892, 2389",
 "4256"), class = "factor")), .Names = "v.m.two", row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

Assess each row of a factor in R

2 Answers2

data