I have a lengthy series of text files from a project I did about 20 years ago (had to import them from floppy disks!). The original software used FORTRAN and could read the files directly, but I would like to do more efficient manipulations in R. When I read the file into R, you get something along the lines of what you would get if you created the following dataframe:
dataset <-
as.data.frame(c("R4 8561 200 365801HARLAN 16161616116616166116",
"R5 8533 100 472801WHITE 11611111111111111111",
"R4 8573 100 485101MCKENNA 11611161161111611161",
"R6 8513 200 489801HOLMES 66116111611161111161",
"R4 8522 200 492201DAY 11111611111111116111",
"R6 8548 100 500901LURTON 11116111911161111111",
"R5 8547 100 507322HUGHES 16611111111161116611",
"R4 85 3 100 518001VANDEVANTER99999911111111111111",
"R5 8553 100 521301LAMAR 99999911111111111111",
1910))
This should start out as a 10 x 1 dataframe. I am pulling my hair out trying to do the following:
(1) drop the last row of the dataset, regardless of how long the dataset is. When I do something like dataset <- dataset[-nrow(dataset),] it turns the frame into a factor for some reason; then
(2) drop everything in each cell before the names. The names always begin 21 characters in;
(3) Once I have that, I would like to separate the names (which are always 11 characters long, including spaces if needed) from the numbers (which represent a series of votes);
(4) Once I have that, slice the numbers into individual cells (which will always be 1, 6, or 9). The length of the number will vary from file-to-file.
Any help is greatly appreciated.