New data frame with unique values and counts

Question

I'd like to create a new data table from my old one that includes a count of all the "article_id" that occur for each date (i.e. there are three article_id's listed for the date 2001-10-01, so I'd like one column with the date and one column that has the article count, "3").

Here is the output of the data table:

            date       article_id  N
   1: 2001-09-01 FAS_200109_11104  3
   2: 2001-10-01 FAS_200110_11126  6
   3: 2001-10-01 FAS_200110_11157 21
   4: 2001-10-01 FAS_200110_11160  5
   5: 2001-11-01 FAS_200111_11220 26
  ---                               
7359: 2019-08-01  FAZ_201908_2958  7
7360: 2019-09-01  FAZ_201909_3316  8
7361: 2019-09-01  FAZ_201909_3515 13
7362: 2000-12-01 FAZ_200012_92981  3
7363: 2001-08-01 FAZ_200108_86041 14

So I'll have to move over the unique date values to a new data frame (so that each date is only shown once), as well as a count of article_id's shown for each date.

I've been trying to figure this out but haven't found exactly the right answer regarding how to count the occurrence of a character vector (the article_id) by group (date). I think this is something pretty simple in R, but I'm new to the program and don't have much support so I would very much appreciate your suggestions - thank you so much!

Do you want the sum of `N` for each 'date' i.e. `dt[, .(N = sum(N)), date]` — akrun, Sep 11 '21 at 19:41

score 1 · Answer 1 · answered Sep 11 '21 at 19:44

We could group and then summarise:

library(dplyr)
df %>% 
    group_by(date) %>% 
    summarise(n = n())

 date           n
  <chr>      <int>
1 2000-12-01     1
2 2001-08-01     1
3 2001-09-01     1
4 2001-10-01     3
5 2001-11-01     1
6 2019-08-01     1
7 2019-09-01     2

score 1 · Accepted Answer · answered Sep 11 '21 at 19:44

1

The expected output is not clear. Some assumptions of expected output

Sum of 'N' by 'date'

library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]

Count of unique 'article_id' for each date

dt1[, .(N = uniqueN(article_id)), by = date]

Get the first count by 'date'

dt1[, .(N = first(N)), by = date]

answered Sep 11 '21 at 19:44

akrun

874,273
37
540
662

1

Hi akrun, thank you so much for the support! This did the trick. – Emily Frank Sep 13 '21 at 10:14

score 0 · Answer 3 · answered Sep 11 '21 at 19:55

Here 2 tidyverse solutions:

Libraries

library(tidyverse)

Example Data

df <- 
  tibble(
    date = ymd(c("2001-09-01","2001-10-01","2001-10-01")),
    article_id = c("FAS_200109_11104","FAS_200110_11126","FAS_200110_11157"),
    N = c(3,6,21)
  )

Solution

Solution 1

df %>% 
  group_by(date) %>% 
  summarise(N = sum(N,na.rm = TRUE))

Solution 2

df %>% 
  count(date,wt = N)

Result

# A tibble: 2 x 2
  date           n
  <date>     <dbl>
1 2001-09-01     3
2 2001-10-01    27