2

I have this first dataset, and I want to create the desired dataset by splitting the text in the first dataset, I'm wondering how could I do this:

Basically the new variables will be split after "XYZ-1" or "AAA-2". I appreciate all the help there is!Thanks!

1st dataset:

Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
x <- data.frame(Name)

desired dataset:

Name <- c("A B XYZ-1 Where","C AAA-2 When","ABC R SS XYZ-1 Where")
Study <- c("A B XYZ-1","C AAA-2","ABC R SS XYZ-1")
Question <- c("Where","When","Where")
x <- data.frame(Name,Study,Question)

Name                      Study             Question

A B XYZ-1 Where           A B XYZ-1         Where       
C AAA-2 When              C AAA-2           When        
ABC R SS XYZ-1 Where      ABC R SS XYZ-1    Where
Bruh
  • 277
  • 1
  • 6

2 Answers2

4

Use separate - pass a regex lookaround in sep to match one or more spaces (\\s+) that follows three upper case letters and a - and a digit ([A-Z]{3}-\\d) and that precedes an uppercase letter ([A-Z])

library(tidyr)
separate(x, Name, into = c("Study", "Question"), 
     sep = "(?<=[A-Z]{3}-\\d)\\s+(?=[A-Z])", remove = FALSE)

-output

                  Name          Study Question
1      A B XYZ-1 Where      A B XYZ-1    Where
2         C AAA-2 When        C AAA-2     When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1    Where
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you! Could you kindly explain this part "(?<=[A-Z]{3}-\\d)\\s+(?=[A-Z])" please? – Bruh Jan 09 '23 at 20:29
  • 2
    @Bruh - you may want to see this Stack Overflow question: [Learning Regular Expressions](https://stackoverflow.com/questions/4736/learning-regular-expressions) – jpsmith Jan 09 '23 at 20:51
3

Here is a base R solution using strsplit with regex:

df <- do.call(rbind, strsplit(x$Name, ' (?=[^ ]+$)', perl=TRUE)) %>% 
  data.frame()
colnames(df) <- c("Study", "Question")
cbind(x[1], df)
                  Name          Study Question
1      A B XYZ-1 Where      A B XYZ-1    Where
2         C AAA-2 When        C AAA-2     When
3 ABC R SS XYZ-1 Where ABC R SS XYZ-1    Where
TarJae
  • 72,363
  • 6
  • 19
  • 66