0

I am beginning the analysis in RStudio of an interview I have made. The interview is, normally, made of the interviewer's questions and the subject's answers.

text<- "Interviewer: Hello, how are you?
Subject: I am fine, thanks.

Interviewer: What is your name?
Subject: My name is Gerard."

I would like to remove all the interviewer's questions to be able to analyze the interview. I do not know how to proceed in R, actually, I do not even know what exactly to Google.

I would appreciate all the help I can get. Thank you in advance.

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25

3 Answers3

1

base R:

text<- "Interviewer: Hello, how are you?
Subject: I am fine, thanks.

Interviewer: What is your name?
Subject: My name is Gerard."

this gives you

text
[1] "Interviewer: Hello, how are you?\nSubject: I am fine, thanks.\n\nInterviewer: What is your name?\nSubject: My name is Gerard."

where the \n are that you split on with strsplit(

strsplit(text, '\n')[[1]] # strsplit returns a list
[1] "Interviewer: Hello, how are you?" "Subject: I am fine, thanks."     
[3] ""                                 "Interviewer: What is your name?" 
[5] "Subject: My name is Gerard."
text2 <- strsplit(text, '\n\)

text2[c(2,5)]
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
Chris
  • 1,647
  • 1
  • 18
  • 25
0

If your data is a vector text as indicated in the question, we can do:

It seems that your data is stored in text -> then try this:

With as_tibble wit transform the vector to a tibble (+/- equal to data frame), then we separate the rows by \n and finally we filte:

library(dplyr)
library(tidyr)

text <- as_tibble(text) %>% 
  separate_rows(value, sep="\n") %>% 
  filter(!grepl("Interviewer", value) & value!="") %>% 
  pull(value)
text
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Thank you for your quick response. I have been a bit imprecise, though. I will be importing interviews on txt Word files. Does the text still count as a vector? Pardon my coding expression illiteracy. – Janez Gorenc Jan 04 '23 at 17:15
  • 2
    `dput(my_word_text_example)`, and copy `structure(...)` above into your question as data. – Chris Jan 04 '23 at 17:18
  • 1
    For future questions please have a look here: – TarJae Jan 04 '23 at 17:21
0

An approach using strsplit and sub/gsub.

text_new <- gsub("\n", "", sub(".*(Subject: )", "\\1", 
              unlist(strsplit(text, "Interviewer: "))))
text_new[nchar(text_new) > 0]
[1] "Subject: I am fine, thanks." "Subject: My name is Gerard."
  • First split the string using Interviewer:.
  • Since the first string includes Subject: remove the residual string until Subject: with sub
  • Remove existing newlines with gsub.
  • Finally select non-empty strings.
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29