2

I have a dataframe and I want to replicate the input of a single cell n times dependent on the input of the next cell and display it in a new cell.

My dataframe looks like this:

data <- data.frame(c(1,1,2,3,4,4,4), c("A","B","A","C","D","E","A"), c(2,1,1,3,2,1,3))
colnames(data) <- c("document number", "term", "count")
data

This is my desired result:

datanew <- data.frame(c(1,2,3,4), c("A A B", "A", "C C C", "D D E A A A"))
colnames(datanew) <- c("document number", "term")


#   document number        term
# 1               1       A A B
# 2               2           A
# 3               3       C C C
# 4               4 D D E A A A

So basically, I like to multiplicate the input of the term cell with the input of the corresponding count cell. Does anyone has an idea how to code it in R?

OTStats
  • 1,820
  • 1
  • 13
  • 22
Oliver
  • 43
  • 3

2 Answers2

3

We can use rep to replicate term count times and paste the data together.

library(dplyr)

data %>%
  group_by(`document number`) %>%
  summarise(new = paste(rep(term, count), collapse = " "))

# A tibble: 4 x 2
#  `document number` new        
#              <dbl> <chr>      
#1                 1 A A B      
#2                 2 A          
#3                 3 C C C      
#4                 4 D D E A A A

Similarly with data.table

library(data.table)
setDT(data)[, (new =  paste(rep(term, count), collapse = " ")), 
               by = `document number`]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can do this with tidyverse methods

library(dplyr)
library(tidyr)
library(stringr)
data %>%
   uncount(count) %>%
   group_by(`document number`) %>% 
   summarise(term = str_c(term, collapse=' '))
# A tibble: 4 x 2
#  `document number` term       
#              <dbl> <chr>      
#1                 1 A A B      
#2                 2 A          
#3                 3 C C C      
#4                 4 D D E A A A

Or with data.table

library(data.table)
setDT(data)[rep(seq_len(.N), count)][, .(term = 
        paste(term, collapse=' ')), `document number`]

Or using base R with aggregate

aggregate(term ~ `document number`, data[rep(seq_len(nrow(data)), 
           data$count),], FUN = paste, collapse= ' ')
akrun
  • 874,273
  • 37
  • 540
  • 662