I import a txt document into R using readLines, but the document is transformed into a charactor vector, namely,every element in the vector denote a line in the txt document, so that I cannot use regular expression to match the multi-row data.How to sove this problem?
example document test.txt
ID cel-let-7 standard; RNA; CEL; 99 BP.
XX
AC MI0000001;
XX
DE Caenorhabditis elegans let-7 stem-loop
XX
RN [1]
RX PUBMED; 11679671.
RA Lau NC, Lim LP, Weinstein EG, Bartel DP;
RT "An abundant class of tiny RNAs with probable regulatory roles in
RT Caenorhabditis elegans";
RL Science. 294:858-862(2001).
I need the data between ID and DE,but the code below don't work, because no way to match multi-row.
pattern <- 'ID.+\nXX\nAC.+\nXX')
m <- gregexpr(pattern, text, perl = T)
perhaps there has another method but I only want to solve using regular expression.