0

I have a data frame in R like this:

  ID   REGION  FACTOR  
  01    north    1
  02    north    1
  03    north    0
  04    south    1
  05    south    1
  06    south    1
  07    south    0
  08    south    0

I want to create a column with the number of lines by 'region' and filtered by some factor (factor==1).

I know how to compute the values, but I could not find the functions to have this output:

  ID   REGION  FACTOR  COUNT
  01    north     1      2
  02    north     1      2
  03    north     0      2
  04    south     1      3
  05    south     1      3
  06    south     1      3
  07    south     0      3 
  08    south     0      3

Could someone help me?

3 Answers3

3

We can use add_count

library(dplyr)
df1 %>%
    add_count(REGION)

If it is to sum FACTOR

df1 %>%
   group_by(REGION) %>%
   mutate(COUNT = sum(FACTOR))
   #or use
   # mutate(COUNT = sum(FACTOR != 0))
# A tibble: 8 x 4
# Groups:   REGION [2]
#     ID REGION FACTOR COUNT
#  <int> <chr>   <int> <int>
#1     1 north       1     2
#2     2 north       1     2
#3     3 north       0     2
#4     4 south       1     3
#5     5 south       1     3
#6     6 south       1     3
#7     7 south       0     3
#8     8 south       0     3

Or using `data.table

library(data.table)
setDT(df1)[, COUNT := sum(FACTOR), by = REGION]

data

df1 <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
"south", "south", "south", "south", "south"), FACTOR = c(1L, 
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

One base R solution using ave, i.e:,

dfout <- within(df, COUNT <- ave(FACTOR,REGION, FUN = sum))

such that

> dfout
  ID REGION FACTOR COUNT
1  1  north      1     2
2  2  north      1     2
3  3  north      0     2
4  4  south      1     3
5  5  south      1     3
6  6  south      1     3
7  7  south      0     3
8  8  south      0     3

DATA

df <- structure(list(ID = 1:8, REGION = c("north", "north", "north", 
"south", "south", "south", "south", "south"), FACTOR = c(1L, 
1L, 0L, 1L, 1L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

group_by the region, then create (mutate) a new column called count, which is the sum of the observations per group, n():

library(tidyverse)

group_by(df, region) %>%
  mutate(count = n()) %>%
  ungroup()

You want to ungroup() at the end so that future calculations do not happen at the grouped level.

Rich Pauloo
  • 7,734
  • 4
  • 37
  • 69