13

Let's say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases).

What I would like R to do is: For each case, give me the number of occurrences of a certain value in a set of variables.

For example the number of occurrences of the numeric value 99 in that single row for V2, V3, V6, which obviously has a minimum of 0 (none of the three have the value 99) and a maximum of 3 (all of the three have the value 99).

I am really looking for an equivalent to the SPSS function COUNT: "COUNT creates a numeric variable that, for each case, counts the occurrences of the same value (or list of values) across a list of variables."

I thought about table() and library plyr's count(), but I cannot really figure it out. Vectorized computation preferred. Thanks a lot!

nilsole
  • 1,663
  • 2
  • 12
  • 28
  • If you want to count the number of occurrences column vice. You can find the answer here: https://stackoverflow.com/questions/69036276/count-occurrences-of-value-in-a-set-of-variables-in-r-per-column#69036276 – INDRAJITH EKANAYAKE Sep 02 '21 at 20:15

6 Answers6

11

If you need to count any particular word/letter in the row.

#Let df be a data frame with four variables (V1-V4)
df <- data.frame(
  V1=c(1,1,2,1,"L"),
  V2=c(1,"L",2,2,"L"), 
  V3=c(1,2,2,1,"L"), 
  V4=c("L","L",1,2,"L"))

For counting number of L in each row just use

#This is how to compute a new variable counting occurences of "L" in V1-V4.      
df$count.L <- apply(df, 1, function(x) length(which(x=="L")))

The result will appear like this

> df
  V1 V2 V3 V4 count.L
1  1  1  1 L       1
2  1  L  2 L       2
3  2  2  2  1      0
4  1  2  1  2      0
Susie Derkins
  • 2,506
  • 2
  • 13
  • 21
Adi.sr
  • 161
  • 1
  • 9
7

Try

apply(df,MARGIN=1,table)

Where df is your data.frame. This will return a list of the same length of the amount of rows in your data.frame. Each item of the list corresponds to a row of the data.frame (in the same order), and it is a table where the content is the number of occurrences and the names are the corresponding values.

For instance:

df=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))
#create a data.frame containing some data
df #show the data.frame
  V1 V2 V3
1 10 20 20
2 20 30 10
3 10 20 20
4 20 30 10
apply(df,MARGIN=1,table) #apply the function table on each row (MARGIN=1)
[[1]]

10 20 
 1  2 

[[2]]

10 20 30 
 1  1  1 

[[3]]

10 20 
 1  2 

[[4]]

10 20 30 
 1  1  1 

#desired result
vrajs5
  • 4,066
  • 1
  • 27
  • 44
WoDoSc
  • 2,598
  • 1
  • 13
  • 26
6

I think that there ought to be a simpler way to do this, but the best way that I can think of to get a table of counts is to loop (implicitly using sapply) over the unique values in the dataframe.

#Some example data
df <- data.frame(a=c(1,1,2,2,3,9),b=c(1,2,3,2,3,1))
df
#  a b
#1 1 1
#2 1 2
#3 2 3
#4 2 2
#5 3 3
#6 9 1

levels=unique(do.call(c,df)) #all unique values in df
out <- sapply(levels,function(x)rowSums(df==x)) #count occurrences of x in each row
colnames(out) <- levels
out
#     1 2 3 9
#[1,] 2 0 0 0
#[2,] 1 1 0 0
#[3,] 0 1 1 0
#[4,] 0 2 0 0
#[5,] 0 0 2 0
#[6,] 1 0 0 1
Miff
  • 7,486
  • 20
  • 20
5

Here is another straightforward solution that comes closest to what the COUNT command in SPSS does — creating a new variable that, for each case (i.e., row) counts the occurrences of a given value or list of values across a list of variables.

#Let df be a data frame with four variables (V1-V4)
df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
       V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))

 #This is how to compute a new variable counting occurences of value "1" in V1-V4.      
    df$count.1 <- apply(df, 1, function(x) length(which(x==1)))

The updated data frame contains the new variable count.1 exactly as the SPSS COUNT command would do.

 > df
      V1 V2 V3 V4 count.1
    1  1  1  1 NA       3
    2  1 NA  2 NA       1
    3  2  2  2  1       1
    4  1  2  1  2       2
    5 NA NA NA NA       0

You can do the same to count how many time the value "2" occurs per row in V1-V4. Note that you need to select the columns (variables) in df to which the function is applied.

df$count.2 <- apply(df[1:4], 1, function(x) length(which(x==2)))

You can also apply a similar logic to count the number of missing values in V1-V4.

df$count.na <- apply(df[1:4], 1, function(x) sum(is.na(x)))

The final result should be exactly what you wanted:

 > df
      V1 V2 V3 V4 count.1 count.2 count.na
    1  1  1  1 NA       3       0        1
    2  1 NA  2 NA       1       1        2
    3  2  2  2  1       1       3        0
    4  1  2  1  2       2       2        0
    5 NA NA NA NA       0       0        4

This solution can easily be generalized to a range of values. Suppose we want to count how many times a value of 1 or 2 occurs in V1-V4 per row:

df$count.1or2 <- apply(df[1:4], 1, function(x) sum(x %in% c(1,2)))
LechAttack
  • 51
  • 1
  • 2
2

A solution with functions from the dplyr package would be the following:

Using the example data set from LechAttacks answer:

df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
       V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))

Count the appearances of "1" and "2" each and both combined:

df %>%
  rowwise() %>%
  mutate(count_1 = sum(c_across(V1:V4) == 1, na.rm = TRUE),
         count_2 = sum(c_across(V1:V4) == 2, na.rm = TRUE),
         count_12 = sum(c_across(V1:V4) %in% 1:2, na.rm = TRUE)) %>%
  ungroup()

which gives the table:

     V1    V2    V3    V4 count_1 count_2 count_12
1     1     1     1    NA       3       0        3
2     1    NA     2    NA       1       1        2
3     2     2     2     1       1       3        4
4     1     2     1     2       2       2        4
5    NA    NA    NA    NA       0       0        0
mirirai
  • 1,365
  • 9
  • 25
-1

In my effort to find something similar to Count from SPSS in R is as follows:

`df <- data.frame(a=c(1,1,NA,2,3,9),b=c(1,2,3,2,NA,1))` #Dummy data with NAs 

`df %>% 
  dplyr::mutate(count = rowSums( #this allows calculate sum across rows
    dplyr::select(., #Slicing on .  
                  dplyr::one_of( #within select use one_of by clarifying which columns your want
                    c('a','b'))), na.rm = T)) #once the columns are specified, that's all you need, na.rm is cherry on top

That's how the output looks like

>df
   a  b count
1  1  1     2
2  1  2     3
3 NA  3     3
4  2  2     4
5  3 NA     3
6  9  1    10

Hope it helps :-)

Prradep
  • 5,506
  • 5
  • 43
  • 84
  • **This is not what the count function from SPSS does.** It does not sum up the Values from multiple variables but counts the occurance of specific value, like "1" for instance. In that case the number of counted "1" would be: 2,1,0,0,0,1 – mirirai May 31 '22 at 09:37