-1

I have a dataframe as follows:

data.frame(title="Title", bk=c("Book 1", "Book 1", "Book 3"), ch=c("Chapter 1", "Chapter 2", "Chapter 1"))

  title     bk        ch
1 Title Book 1 Chapter 1
2 Title Book 1 Chapter 2
3 Title Book 3 Chapter 1

How do I repeat each observation based on the cumsum index below:

id=c(1,1,1,2,2,3,3,3,3)

So that the dataframe can be expanded in such a way so as to accommodate the source vector which generated the cumsum index?

  title     bk        ch   source_vector
1 Title Book 1 Chapter 1   ...
1 Title Book 1 Chapter 1   
1 Title Book 1 Chapter 1   
2 Title Book 1 Chapter 2   
2 Title Book 1 Chapter 2   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   
3 Title Book 3 Chapter 1   
Sati
  • 716
  • 6
  • 27

4 Answers4

1

An option would be to use separate_rows

library(tidyverse)
df1 %>%
    separate_rows(content)
#  title     bk        ch content
#1 Title Book 1 Chapter 1    This
#2 Title Book 1 Chapter 1      is
#3 Title Book 1 Chapter 1     the
#4 Title Book 1 Chapter 2 content
#5 Title Book 1 Chapter 2      of
#6 Title Book 3 Chapter 1    each
#7 Title Book 3 Chapter 1 chapter
#8 Title Book 3 Chapter 1      in
#9 Title Book 3 Chapter 1   books

If we need the original rows replicated

df1 %>% 
    uncount(str_count(content, "\\w+")) %>%
    as_tibble
# A tibble: 9 x 4
#  title bk     ch        content              
#  <fct> <fct>  <fct>     <fct>                
#1 Title Book 1 Chapter 1 This is the          
#2 Title Book 1 Chapter 1 This is the          
#3 Title Book 1 Chapter 1 This is the          
#4 Title Book 1 Chapter 2 content of           
#5 Title Book 1 Chapter 2 content of           
#6 Title Book 3 Chapter 1 each chapter in books
#7 Title Book 3 Chapter 1 each chapter in books
#8 Title Book 3 Chapter 1 each chapter in books
#9 Title Book 3 Chapter 1 each chapter in books
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    So how do you handle the `per id` part here? Because If this is the solution then we agree that it is a dupe – Sotos Jul 22 '19 at 14:10
  • 1
    @Sotos I would say that if the OP comes up with a giant `for` loop and wants to fix something, would that be fair to show an easier solution without a for loop? My comment to your tagging was based on the intention of the OP's post but the output he/she gets is the samee – akrun Jul 22 '19 at 14:11
  • Sure. But I don't get your point. The example works because they are the same length as each group. Maybe I don't understand the question – Sotos Jul 22 '19 at 14:13
  • @Sotos Here, the OP comes up with an `strsplit`, created ssome 'id's and then want to get expected output in a round about way – akrun Jul 22 '19 at 14:13
  • @Sotos If you look at the OP's code, he is splitting by space in 'content' column – akrun Jul 22 '19 at 14:14
  • ahhh, ok. I see what you mean now. Then Yes, you should have shown the best way, as you did. But in the same sense it should also be duped with the simpler one :) – Sotos Jul 22 '19 at 14:14
  • 1
    *But, that doesn't happen while others are posting*...I see you are steering away from friendly discussion so I will take my leave. Have a good one Arun! – Sotos Jul 22 '19 at 14:18
1

In base you can use do.call of r.bind, after you have done strsplit and cbind of each row like:

x <- data.frame(title="Title", bk=c("Book 1", "Book 1", "Book 3"), ch=c("Chapter 1", "Chapter 2", "Chapter 1"), content=c("This is the", "content of", "each chapter in books"))
do.call("rbind", by(x, 1:nrow(x), function(x) {cbind(x[-ncol(x)], str_split_content=strsplit(as.character(x$content[1]), " ")[[1]])}))
#    title     bk        ch str_split_content
#1.1 Title Book 1 Chapter 1              This
#1.2 Title Book 1 Chapter 1                is
#1.3 Title Book 1 Chapter 1               the
#2.1 Title Book 1 Chapter 2           content
#2.2 Title Book 1 Chapter 2                of
#3.1 Title Book 3 Chapter 1              each
#3.2 Title Book 3 Chapter 1           chapter
#3.3 Title Book 3 Chapter 1                in
#3.4 Title Book 3 Chapter 1             books
GKi
  • 37,245
  • 2
  • 26
  • 48
1

If you simply want to expand the rows based on the number of words in content, then here is one way to do it,

library(splitstackshape)
expandRows(ddf, lengths(gregexpr("\\W+", ddf$content)) + 1, count.is.col = FALSE)

#    title     bk        ch               content
#1   Title Book 1 Chapter 1           This is the
#1.1 Title Book 1 Chapter 1           This is the
#1.2 Title Book 1 Chapter 1           This is the
#2   Title Book 1 Chapter 2            content of
#2.1 Title Book 1 Chapter 2            content of
#3   Title Book 3 Chapter 1 each chapter in books
#3.1 Title Book 3 Chapter 1 each chapter in books
#3.2 Title Book 3 Chapter 1 each chapter in books
#3.3 Title Book 3 Chapter 1 each chapter in books
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • @akrun I know, but based on our and with OP discussion, I thought that maybe all they needed to find out was how to expand....answering on assumptions until OP clarifies I guess – Sotos Jul 22 '19 at 14:37
  • What does that have to do with this answer? And Yes I know you don't downvote. I disagree... – Sotos Jul 22 '19 at 14:41
  • Yes, that plus the reopening/noise, etc...but I don't understand why we are discussing this... – Sotos Jul 22 '19 at 14:42
1

This is closer to what I was looking for:

df %>%
  mutate(str_split_content = str_split(content, " ")) %>%
  unnest()

Someone posted, then revised/removed a while ago.

The original str_split content was by punctuation, actually. So not exactly purely splitting by number of words.

Sati
  • 716
  • 6
  • 27
  • 1
    df %>% unnest(str_split_content = str_split(content, " ")) Just read the doc, and unnest allows for that :) – Pablo Rod Jul 22 '19 at 22:55