I have a dataframe I made by scraping some data with rvest
and using str_split_fixed
.
It looks something like this
a b c d
48 08 7 10
52 03 6 05
47 05 3 05
48 05 6+11 00
7.5 0548 14
6 0550 06
41 05 2.5 08
1 0251 6 10
Because of the way the data is stored on this website I end up with some rows where the variables are stored in the wrong column and some columns are blank while others contain two variables.
Currently for the above example I'm trying to "correct" rows 5 and 6, because they are formatted the same incorrect way. If I can figure out how to get this ifelse to work I will be able to do 1 or 2 more to correct the other rows that come into the dataframe incorrectly formatted (in this example, for instance, rows 4 and 8 still need work)
I'm trying to correct this using an if statement that has multiple conditions and multiple actions.
This is what I tried most recently:
if(nchar(df$a) < 2 && nchar(df$b) < 5) {
df$c <- df$b
df$d <- substr(df$c, 0, 2)
df$b <- df$d
df$a <- substr(df$c, 3, 10)}
else {
df <- df}
The code runs but the dataframe that comes out is identical to how it was going in, I expected rows 5 and 6 of the output to be
48 14 7.5 05
50 06 6 05
I tried searching first and there were certainly a lot of questions regarding multiple conditions or multiple actions, but I had trouble finding one where both were in play or in a way that was similar enough for me to be able to apply the solution.
Edit: Here is some of the data before I did str_split_fixed
"52u-08-3½ -03" "47o-09-2½ -17" "-7½ -0548u-14" "-1½ -0840u-06"
The desired output from those 4 would be:
a b c d
52 08 3.5 03
47 09 2.5 17
48 14 7.5 05
40 06 1.5 08
Perhaps I should just be looking for a more sophisticated and surgical way of splitting the data up to begin with, based on how that chunk of it is formatted. I'm pretty unskilled so when I'm trying new stuff my code is usually very frankstein-monster like.