I would like to extract data from this text blob. This text contains both tab-delimited text and xml tagged text. I would like to extract the xml blob and parse it separately for my analysis.
Text1 Text2 text3 text4 text4 <Assessment>
<Questions>
<Question>
<Id>1</Id>
<Key>Instructions</Key>
<QuestionText>Your Age</QuestionText>
<QuestionType>Label</QuestionType>
<Answer>16-30</Answer>
</Question>
</Questions>
</Assessment> text5
Text1 Text2 text3 text4 text4 <Assessment>
<Questions>
<Question>
<Id>1</Id>
<Key>Instructions</Key>
<QuestionText>Your Age</QuestionText>
<QuestionType>Label</QuestionType>
<Answer>31-49</Answer>
</Question>
</Questions>
</Assessment> text5
I have read the text using readlines
and did the following.
tst<-gsub("^\\s+","", tst)
idx<-which(grepl("+<Assessment>+", tst))
tst[idx]<-"<Assessment>"
idx<-which(grepl("</Assessment>", tst))
tst[idx]<-"</Assessment>"
Still haven't figured out how to parse it using XML.