1

I did not think this could be this hard but probably I just don't see an easy solution. I have this Data Frame with 4 variables: url, title, date, text. text is a very long character string. Now I want to combine all text rows from the same date. I don't need the rest. I tried group_by but it doesn't seem to change anything.

this is what my df looks like

url,title,date,text
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx

and this is what I want

16.06.2020, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
15.06.2020, xxxxxxxxxxxxxxxxxxxxxxxxx

thx for your help!

Nico Saameli
  • 321
  • 2
  • 11

2 Answers2

2

dplyr

library(dplyr)
dat %>%
  group_by(date) %>%
  summarize(text = paste0(text, collapse = ""))
# # A tibble: 2 x 2
#   date       text     
#   <chr>      <chr>    
# 1 15.06.2020 xxxxxxxxx
# 2 16.06.2020 xxxxxxxxx

data.table

library(data.table)
as.data.table(dat)[, .(text = paste0(text, collapse = "")), by = .(date)]
#          date      text
# 1: 16.06.2020 xxxxxxxxx
# 2: 15.06.2020 xxxxxxxxx

base R

aggregate(text~date, dat, paste0, collapse = '')
#         date      text
# 1 15.06.2020 xxxxxxxxx
# 2 16.06.2020 xxxxxxxxx

Data:

dat <- read.csv(text="url,title,date,text
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,16.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx
www.xxx,xxx,15.06.2020,xxx")
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    In Base R, `aggregate` seems to be a go-to function `aggregate(text~date, dat, paste0, collapse = '')` – Ronak Shah Jun 16 '20 at 00:35
  • Thanks RonakShah ... for some reason, I *know* about `aggregate` but it keeps blurring in my head with `ave` and others ... where that blurring doesn't make sense. – r2evans Jun 16 '20 at 01:50
1

An option with str_c and dplyr

library(dplyr)
library(stringr)
dat %>%
     group_by(date) %>%
     summarise(text = str_c(text, collapse=""))
# A tibble: 2 x 2
#  date       text     
#  <chr>      <chr>    
#1 15.06.2020 xxxxxxxxx
#2 16.06.2020 xxxxxxxxx

data

dat <- structure(list(url = c("www.xxx", "www.xxx", "www.xxx", "www.xxx", 
"www.xxx", "www.xxx"), title = c("xxx", "xxx", "xxx", "xxx", 
"xxx", "xxx"), date = c("16.06.2020", "16.06.2020", "16.06.2020", 
"15.06.2020", "15.06.2020", "15.06.2020"), text = c("xxx", "xxx", 
"xxx", "xxx", "xxx", "xxx")), class = "data.frame", row.names = c(NA, 
-6L))
akrun
  • 874,273
  • 37
  • 540
  • 662