There are several things wrong with your code. The reason you are getting NA
s is that you are passing NULL
s all over the place. Did you ever look at r_d$Account
? When you see problems in your code, you should start by going things piece-meal step-by-step, and in this case you'll see that r_d$Account
gives you NULL
. Why? Because you did not rename the columns correctly. colnames(r_d)
will be revealing.
First, rename
either does non-standard evaluation with un-quoted arguments, or rename_
takes a vector of character
=character
pairs. These might work (I can't know for certain, since I'm not about to transcribe your image of data ... please provide copyable output from dput
next time!):
# non-standard evaluation
rename(all_data, Number=V1, Dates=V2, Account=V3)
# standard-evaluation #1:
rename_(all_data, Number="V1", Dates="V2", Account="V3")
# standard-evaluation #2
rename_(all_data, .dots = c("Number"="v1", "Dates"="V2", "Account"="V3"))
From there, if you step through your code, you should see that r_d$Account
is no longer NULL
.
Second, is there a reason you create r_d
but still reference all-data
? There are definitely times when you need to do this kind of stuff; here is not one of them, it is too prone to problems (e.g., if the row-order or dimensions of one of them changes).
Third, because you convert $Account
to character
, it is really inappropriate to use inequality comparisons. Though it is certainly legal to do so ("1" < "2"
is TRUE
), it will run into problems, such as "11" < "2"
is also TRUE
, and "3" < "22"
is FALSE
. I'm not saying that you should avoid conversion to string; I think it is appropriate. Your use of account ranges is perplexing: an account number should be categorical, not ordinal, so selecting a range of account numbers is illogical.
Fourth, even assuming that account numbers should be ordinal and ranges make sense, your use of filter
can be improved, but only if you either (a) accept that comparisons of stringified-numbers is acceptable ("3" > "22"
), or (b) keep them as integers. First, you should not be referencing r_d$
within a NSE dplyr function. (Edit: you also need to group your logic with parentheses.) This is a literal translation from your code:
f_d <- filter(r_d, (Account >= 42301 & Account <= 42315) |
(Account >= 20202 & Account <= 20210) |
Account == 98010 | Account == 98015)
You can make this perhaps more readable with:
f_d <- filter(r_d,
Account %in% c(98010, 98015) |
between(Account, 42301, 42315) |
between(Account, 20202, 20210)
)
Perhaps a better way to do it, assuming $Account
is character
, would be to determine which accounts are appropriate based on some other criteria (open date, order date, something else from a different column), and once you have a vector of account numbers, do
filter(r_d,
Account %in% vector_of_interesting_account_numbers)