I have this PDF file from European parliament, that you can download here. I have downloaded it and put it in R. It contains lists of names of Members of European Parliament (MEP) after a session of vote.
I want to extract just bits of these lists. Specifically, I want to extract and put in a table the names situated between "AVGIVNA RÖSTER"
and 0
, see the text highlighted in this screenshot.
Similar series of names repeat in the PDF. It refers to specific votes. I want them all in a table. MEP's names change but the structure remains, they are always situated between the bits "AVGIVNA RÖSTER"
and "0".
I thought of using a startswith
function and and a for loop"but I struggle with the writing.
Here is what I did so far:
library(pdftools)
library(tidyverse)
votetext <- pdftools::pdf_text("MEP.pdf") %>%
readr::read_lines()