1

I have a dataframe I made by scraping some data with rvest and using str_split_fixed.

It looks something like this

a     b       c         d

48    08      7        10
52    03      6        05
47    05      3        05
48    05     6+11      00
     7.5     0548      14
      6      0550      06
41    05     2.5       08
 1  0251     6         10

Because of the way the data is stored on this website I end up with some rows where the variables are stored in the wrong column and some columns are blank while others contain two variables.

Currently for the above example I'm trying to "correct" rows 5 and 6, because they are formatted the same incorrect way. If I can figure out how to get this ifelse to work I will be able to do 1 or 2 more to correct the other rows that come into the dataframe incorrectly formatted (in this example, for instance, rows 4 and 8 still need work)

I'm trying to correct this using an if statement that has multiple conditions and multiple actions.

This is what I tried most recently:

if(nchar(df$a) < 2 && nchar(df$b) < 5) {
       df$c <- df$b
       df$d <- substr(df$c, 0, 2)
       df$b <- df$d
       df$a <- substr(df$c, 3, 10)} 
  else {
             df <- df}

The code runs but the dataframe that comes out is identical to how it was going in, I expected rows 5 and 6 of the output to be

48   14   7.5    05
50   06    6     05

I tried searching first and there were certainly a lot of questions regarding multiple conditions or multiple actions, but I had trouble finding one where both were in play or in a way that was similar enough for me to be able to apply the solution.

Edit: Here is some of the data before I did str_split_fixed

"52u-08-3½ -03" "47o-09-2½ -17" "-7½ -0548u-14"  "-1½ -0840u-06"

The desired output from those 4 would be:

a   b   c    d
52   08  3.5  03
47   09  2.5  17
48   14  7.5  05
40   06  1.5  08

Perhaps I should just be looking for a more sophisticated and surgical way of splitting the data up to begin with, based on how that chunk of it is formatted. I'm pretty unskilled so when I'm trying new stuff my code is usually very frankstein-monster like.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • garbage in garbage out.. I would say to be more careful in capturing the data and parsing it... maybe `str_split_fixed` is not the best way to parse it? Can you please show us what the data looks like before that step? – Amit Kohli Nov 17 '16 at 09:32
  • @MattMurdock, maybe read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?noredirect=1&lq=1) to make it easier for us to help. – InspectorSands Nov 17 '16 at 09:36
  • You're probably right that str_split_fixed wasn't my best option. I definitely felt like I was going to be making code like a frankenstein monster, but I'm sufficiently unskilled to have to do that sometimes until I learn enough to do things in a more clean way. – MattMurdock Nov 17 '16 at 09:38

0 Answers0