I have a text file where strings are separated by whitespaces. I can easily extract these into R as a data frame, by first using the scan command and then seeing that each record has 15 strings in them.
So data[1:15} is one row, data[16:30} is the other row and so on. In each of these records, the name is composed of two strings, say FOO and BAR. But some records have names such as FOO BOR BAR or even FOO BOR BOO BAR. This obviously messes with my 15 string theory. How can I easily extract the data into a data frame?
So my data is in my working directory called results.txt
.
I use this to scan my data:
mech <- scan("results.txt", "")
Then I can make the data frames like this:
d1 <- t(data.frame(mech[1:15]))
d2 <- t(data.frame(mech[16:30]))
d3 <- t(data.frame(mech[31:45]))
My plan was to iterate this in a for
loop and rbind
the data into one consolidated data frame.
d1
results in something like
1 FOO BAR 2K12/ME/01 96 86 86 92 73 86 72 168 82 30 84.93
d2
results in
2 FOO2 BAR2 2K12/ME/02 72 83 61 75 44 88 75 165 91 30 72.60
Here, FOO and BAR are first and last names, respectively. Most records are like this. But d3
:
3 FOO3 BOR BAR3 2K12/ME/03 72 83 61 75 44 88 75 165 91 30
Because of the extra middle name, I lose the final string of the text, the part right after 30. This then spills over to the next record. So row 46:60, instead of starting with 4, begins with the omitted data from the previous record.
How can I extract the data by treating the names as a single string?
EDIT: Stupid of me for not providing the data frame itself. Here is a sample.
1 FOO BAR 2K12/ME/01 96 86 86 92 73 86 72 168 82 30 84.93
2 FOO2 BAR2 2K12/ME/02 72 83 61 75 44 88 75 165 91 30 72.60
3 FOO3 BOR BAR3 2K12/ME/03 63 84 62 62 50 79 74 157 85 30 69.13
4 FOO4 BOR BAR4 2K12/ME/04 89 88 74 79 77 83 68 182 82 30 81.93