I have a list of >90 pdf files that I read and cleaned in R, I extracted two fields for every each of these files: Number and Date. My current data frame includes one column where there is a row for the Number and the following row is the Date that corresponds to that Number. I am trying to convert the row with the date that corresponds to the Number into a column. I am having so much trouble figuring this out, and I will appreciate any help. I have deleted manually part of the strings that are part of each row in the section "Example of current data frame".Please see dput output to see how the actual data frame looks like.
This is the code that produce my current data frame
PDFreader <- function(x){
t <- pdf_text (x)
page_1 <- t
}
op2 <- lapply(pt, PDFreader)
op2.1 <- sapply(op2 ,strsplit, split = "\n")
op3 <- rapply(op2.1, grep, pattern = "Number:|Date:",
value = TRUE) %>%
unique()
df_all <- as.data.frame(op3)%>%
unique()
df_all$op3 <- as.character(as.factor(df_all$op3))
dput(head(df_all))
structure(list(op3 = c("Number: 11", "Date: 01/03/2018 Last Revised Review: AM #17",
"Date: 01/03/2018 Last Revised Review: AM #17",
"Date: 01/03/2018 Last Revised Review: AM #17",
"Date: 01/03/2018 Last Revised Review: AM #17",
" Date: 09/10/2018 Last Revised Review: AM# 39"
)), .Names = "op3", row.names = c(NA, 6L), class = "data.frame")
Example of my current data frame:
op3 --> COLUMN NAME
Number: 11
Date: 01/03/2018 .. some text
Date: 01/03/2018.. some text
Date: 01/03/2018 .. some text
Date: 01/03/2018 .. some text
Date: 09/10/2018 .. some text
Number: 12
Date: 12/06/2016 .. some text
Date: 12/06/2016 .. some text
Date: 12/06/2016 .. some text
Number: 13
Date: 10/29/2018 .. some text
Date: 10/29/2018 .. some text
Date: 10/29/2018 .. some text
Date: 10/29/2018.. some text
Desire Data Frame
op3 op4
Number:11 Date:01/03/2018
Number:12 Date:12/06/2016
Number:13 Date:10/29/2018