2

I'd like to create a new data table from my old one that includes a count of all the "article_id" that occur for each date (i.e. there are three article_id's listed for the date 2001-10-01, so I'd like one column with the date and one column that has the article count, "3").

Here is the output of the data table:

            date       article_id  N
   1: 2001-09-01 FAS_200109_11104  3
   2: 2001-10-01 FAS_200110_11126  6
   3: 2001-10-01 FAS_200110_11157 21
   4: 2001-10-01 FAS_200110_11160  5
   5: 2001-11-01 FAS_200111_11220 26
  ---                               
7359: 2019-08-01  FAZ_201908_2958  7
7360: 2019-09-01  FAZ_201909_3316  8
7361: 2019-09-01  FAZ_201909_3515 13
7362: 2000-12-01 FAZ_200012_92981  3
7363: 2001-08-01 FAZ_200108_86041 14 

So I'll have to move over the unique date values to a new data frame (so that each date is only shown once), as well as a count of article_id's shown for each date.

I've been trying to figure this out but haven't found exactly the right answer regarding how to count the occurrence of a character vector (the article_id) by group (date). I think this is something pretty simple in R, but I'm new to the program and don't have much support so I would very much appreciate your suggestions - thank you so much!

3 Answers3

1

We could group and then summarise:

library(dplyr)
df %>% 
    group_by(date) %>% 
    summarise(n = n())
 date           n
  <chr>      <int>
1 2000-12-01     1
2 2001-08-01     1
3 2001-09-01     1
4 2001-10-01     3
5 2001-11-01     1
6 2019-08-01     1
7 2019-09-01     2
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

The expected output is not clear. Some assumptions of expected output

  1. Sum of 'N' by 'date'
library(data.table)
dt[, .(N = sum(N, na.rm = TRUE)), by = date]
  1. Count of unique 'article_id' for each date
dt1[, .(N = uniqueN(article_id)), by = date]
  1. Get the first count by 'date'
dt1[, .(N = first(N)), by = date]
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here 2 tidyverse solutions:

Libraries

library(tidyverse)

Example Data

df <- 
  tibble(
    date = ymd(c("2001-09-01","2001-10-01","2001-10-01")),
    article_id = c("FAS_200109_11104","FAS_200110_11126","FAS_200110_11157"),
    N = c(3,6,21)
  )

Solution

Solution 1

df %>% 
  group_by(date) %>% 
  summarise(N = sum(N,na.rm = TRUE))

Solution 2

df %>% 
  count(date,wt = N)

Result

# A tibble: 2 x 2
  date           n
  <date>     <dbl>
1 2001-09-01     3
2 2001-10-01    27
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32