0

I'm having trouble trying to replicate some of the countif function I'm familiar with in excel. I've got a data frame, and it has a large number of rows. I'm trying to take 2 variables (x & z) and do a countif of how many other variables within my dataframe match that. I figured out doing:

sum('mydataframe'$x==`mydataframe`$x[1]&`mydataframe'$z==`mydataframe`$z[1])

This gives me the correct countif for x&z within the whole data set for the first row [1]. The problem is I've got to use that [1]. I've tried using the (with,...) command, but then I can no longer access the whole column.

I'd like to be able to do the count of x & z combination for each row within the data frame then have that output as a new vector that I can just add as another column. And I'd like this to go on for every row through to the end.

Hopefully this is pretty simple. I figure some combination of (with,..) or apply or something will do it, but I'm just too new.

I am interested in a count total in every instance, not a running sequential count.

Jason
  • 15
  • 4

2 Answers2

1

It seems that you are asking for a way to create a new column that contains the number of rows in the entire data frame with x and z value equal to the values of those variables for that row.

With a bit of sample data:

(dat <- data.frame(x=c(1, 1, 2), z=c(3, 3, 3)))
#   x z
# 1 1 3
# 2 1 3
# 3 2 3

One simple approach would be grouping with dplyr's group_by function and then creating a new column with the number of elements in that group:

library(dplyr)
dat %>% group_by(x, z) %>% mutate(n=n())
#       x     z     n
#   (dbl) (dbl) (int)
# 1     1     3     2
# 2     1     3     2
# 3     2     3     1

A base R solution would probably involve ave:

dat$n <- ave(rep(NA, nrow(dat)), dat$x, dat$z, FUN=length)
dat
#   x z n
# 1 1 3 2
# 2 1 3 2
# 3 2 3 1
josliber
  • 43,891
  • 12
  • 98
  • 133
0

An option using data.table would be to convert the 'data.frame' to 'data.table' (setDT(dat)) , group by 'x', 'z' and assign 'n' as the number of elements in each group (.N).

library(data.table)
setDT(dat)[, n:= .N, by = .(x,z)]
dat
#   x z n
#1: 1 3 2
#2: 1 3 2
#3: 2 3 1
akrun
  • 874,273
  • 37
  • 540
  • 662