0
library(pdfsearch)
Characters <- c("Ben", "John")
keyword_search('location of file', 
               keyword = Characters,
               path = TRUE)


     keyword page_num

1      Ben    1
2      Ben    1
3     John    1
4     John    2

How can i make R count all my keywords on every page_num, creating a dataframe like:

      name   page  count
1      Ben    1      2
2     John    1      1
3     John    2      1

I know nrow function but is there a faster way?

nrow(dataframe[dataframe$keyword == "Ben" & dataframe$page_num == 1, ])
  • Maybe try `df2 <- as.data.frame(table(df))` to get frequencies, and then `df2[df2$Freq != 0, ]` if you want to remove those with zero counts... – Ben Nov 16 '20 at 14:40

1 Answers1

0

Base R supports a wide variety of ways to perform grouped operations (probably too many, as it makes choosing the appropriate method harder):

my_data <- data.frame(name = c("Ben", "Ben", "John", "John"), page_num = c(1,1,1,2))

> test
  name page_num
1  Ben        1
2  Ben        1
3 John        1
4 John        2


# table()

> table(my_data)
      page_num
name   1 2
  Ben  2 0
  John 1 1

> as.data.frame(table(my_data))
  name page_num Freq
1  Ben        1    2
2 John        1    1
3  Ben        2    0
4 John        2    1

# xtabs

> xtabs(~ name + page_num, data = test)

      page_num
name   1 2
  Ben  2 0
  John 1 1

> as.data.frame(xtabs(~ name + page_num, data = my_data))
  name page_num Freq
1  Ben        1    2
2 John        1    1
3  Ben        2    0
4 John        2    1

Other functions for performing grouped operations include by(), tapply(), ave() and more.

The popular dplyr package also has a syntax for performing grouped operations on data.frame objects without transformation:

library(dplyr)

# `group_by()`, `mutate()`, `%>%`, and `n()` are exports from `dplyr`
my_data %>%
  group_by(name, page_number) %>%
  mutate(count = n())
  # n() is a dplyr operator that is mechanically identical to length()
bcarlsen
  • 1,381
  • 1
  • 5
  • 11