1

I have a list of >90 pdf files that I read and cleaned in R, I extracted two fields for every each of these files: Number and Date. My current data frame includes one column where there is a row for the Number and the following row is the Date that corresponds to that Number. I am trying to convert the row with the date that corresponds to the Number into a column. I am having so much trouble figuring this out, and I will appreciate any help. I have deleted manually part of the strings that are part of each row in the section "Example of current data frame".Please see dput output to see how the actual data frame looks like.

This is the code that produce my current data frame

PDFreader <- function(x){
  t <- pdf_text (x)
    page_1 <- t
 }

op2 <- lapply(pt, PDFreader)


op2.1 <- sapply(op2 ,strsplit, split = "\n")

op3 <- rapply(op2.1, grep, pattern = "Number:|Date:", 
value = TRUE) %>%
unique()

  df_all <- as.data.frame(op3)%>%
    unique()
     df_all$op3 <- as.character(as.factor(df_all$op3))



dput(head(df_all))
 structure(list(op3 = c("Number: 11", "Date: 01/03/2018  Last Revised Review: AM #17", 
 "Date: 01/03/2018                      Last Revised Review: AM #17", 
 "Date: 01/03/2018                        Last Revised Review: AM #17", 
 "Date: 01/03/2018                     Last Revised Review: AM #17", 
  " Date: 09/10/2018               Last Revised Review: AM# 39"
 )), .Names = "op3", row.names = c(NA, 6L), class = "data.frame")

Example of my current data frame:

      op3       --> COLUMN NAME 


 Number: 11

 Date: 01/03/2018 .. some text
 Date: 01/03/2018.. some text
 Date: 01/03/2018 .. some text
 Date: 01/03/2018 .. some text
 Date: 09/10/2018 .. some text


 Number: 12

 Date: 12/06/2016 .. some text 
 Date: 12/06/2016  .. some text
 Date: 12/06/2016 .. some text

 Number: 13

 Date: 10/29/2018 .. some text 
 Date: 10/29/2018 .. some text
 Date: 10/29/2018 .. some text 
 Date: 10/29/2018.. some text

Desire Data Frame

  op3               op4 
Number:11      Date:01/03/2018
Number:12      Date:12/06/2016
Number:13      Date:10/29/2018
Andrea Ovalle
  • 73
  • 1
  • 1
  • 4
  • 2
    Can you make a reproducible example? Perhaps using `dput(head(yourdata))`? See https://stackoverflow.com/help/mcve, https://stackoverflow.com/questions/5963269 – Evan Friedland Nov 07 '18 at 02:16
  • Thank you , I have added dput(head(df_all)) output to my post, I am still unsure if this is reproducible enough. I welcome any suggestions. – Andrea Ovalle Nov 07 '18 at 13:25

0 Answers0