-2

I extracted data from pdf tables, but it returns a vector with strings. I would like to to become a matrix.

For example,

[1] "XX/R011680/2   Fun          9-10   XX/R008108/2     No fun     *N/A"
[2] "XX/X002103/2   Fun         8-8.9   XX/S00257X/2     No fun     *N/A"
[3] "XX/X011443/2   Fun         8-8.9"
[4] "XX/X008728/2   No fun      7-7.9" 

it is possible to cut it by the spaces somehow. Such that it becomes a matrix like this.

     [,1]            [,2]          [,3]       [,4]              [,5]        [,6]
[1] "XX/X011680/2"   "Fun"         "9-10"   "XX/X008108/2"     "No fun"    "*N/A"
[2] "XX/X002103/2"   "Fun"         "8-8.9"  "XX/X00257X/2"     "No fun"    "*N/A"
[3] "XX/X011443/2"   "Fun"         "8-8.9"     NA               NA          NA
[4] "XX/X008728/2"   "No fun"      "7-7.9"     NA               NA          NA 

or like this if it is easier? The position of the rows does not matter as I can sort it out later.

   [,1]              [,2]          [,3] 
[1] "XX/X011680/2"   "Fun"        "9-10" 
[2] "XX/X008108/2"   "No fun"     "*N/A"
[3] "XX/X002103/2"   "Fun"        "8-8.9"   
[4] "XX/X00257X/2"   "No fun"     "*N/A"
[5] "XX/X011443/2"   "Fun"        "8-8.9"     
[6] "XX/X008728/2"   "No fun"     "7-7.9"   
Sam
  • 261
  • 2
  • 12

1 Answers1

0

Assuming the input L given reproducibly in the Note below remove the double quotes, translate 2 or more spaces to comma and then read with read.table:

L2 <- gsub('"', '', gsub('  +', ',', L))
read.table(text = L2, as.is = TRUE, sep = ",", fill = TRUE)

Note

L <- 
c("\"XX/R011680/2   Fun          9-10   XX/R008108/2     No fun     *N/A\"", 
"\"XX/X002103/2   Fun         8-8.9   XX/S00257X/2     No fun     *N/A\"", 
"\"XX/X011443/2   Fun         8-8.9\"", 
"\"XX/X008728/2   No fun      7-7.9\""
)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341