Your code returns the pdf as text. The table is in there, you just have to get it out.
library(pdftools)
file <- "https://www.usbr.gov/mp/cvo/vungvari/milfln.pdf"
rawtext <- pdf_text(file)
start <- "\r\n 1"
end <- "\r\n TOTALS"
df <- read.table(text=substring(rawtext, regexpr(start, rawtext), regexpr(end, rawtext)))
> head(df)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
1 1 41,633 18,956 86,010 100,117 121,958 24,176 40,473 3,739 437,062 +1,443 +728 6,904
2 2 41,867 20,547 86,241 100,249 122,481 24,282 40,635 3,715 440,017 +2,955 +1,490 6,874
3 3 42,212 22,371 86,148 100,249 122,882 23,753 40,841 3,739 442,195 +2,178 +1,098 8,201
4 4 43,002 25,223 86,460 100,396 123,395 23,734 41,059 3,722 446,991 +4,796 +2,418 9,363
5 5 44,225 29,253 86,923 100,601 123,804 23,602 41,356 3,680 453,444 +6,453 +3,253 11,137
6 6 45,978 34,036 87,529 101,234 123,954 23,548 41,642 3,643 461,564 +8,120 +4,094 12,288
V14 V15
1 7,632 1,554.8
2 8,364 1,571.4
3 9,299 1,589.8
4 11,781 1,613.2
5 14,390 1,641.7
6 16,382 1,674.2
regexpr
returns the index in the string where start
and end
are. substring
uses those indices to limit the text to just the table. Then read.table
formats the text as a data.frame
. You'll have to add column headers yourself. You'll probably also want to remove commas and change the columns to numeric too.