0

I have a dataset where I am trying to filter based on 3 different columns.

I have the 2 columns that have character values figured out by doing: filter(TRANSACTION_TYPE != "ABC", CUSTOMER_CODE == "123") however, I have a "VERSION" column where there will be multiple versions for each customer which will then duplicate my $ amount. I want to filter on only the VERSION that contains ".000" as decimal since the .000 represents the final and most accurate version. For example, VERSION can = 20220901.000 and 20220901.002 (enter image description here ), 20220901.003, etc. However the numbers before the decimal will always change so I can't filter on it to equal this 20220901 as it will change by day.

I hope I was clear enough, thank you!

r2evans
  • 141,215
  • 6
  • 77
  • 149
Dar
  • 3
  • 2
  • Is `VERSION` class `character` or `numeric`? If the former, then `filter(grepl("\\.000$", VERSION))` should work. – r2evans Nov 28 '22 at 15:09
  • Welcome to SO, Dar! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Nov 28 '22 at 15:09
  • Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Nov 28 '22 at 15:09

1 Answers1

0

Sample data:

quux <- data.frame(VERS_chr = c("20220901.000","20220901.002","20220901.000","20220901.002"),
                   VERS_num = c(20220901.000,20220901.002,20220901.000,20220901.002))

If is.character(quux$VERSION) is true in your data, then

dplyr::filter(quux, grepl("\\.000$", VERS_chr))
#       VERS_chr VERS_num
# 1 20220901.000 20220901
# 2 20220901.000 20220901

Explanation:

  • "\\.000$" matches the literal period . (it needs to be escaped since it's a regex reserved symbol) followed by three literal zeroes 000, at the end of string ($). See https://stackoverflow.com/a/22944075/3358272 for more info on regex.

If it is false (and it is not a factor), then

dplyr::filter(quux, abs(VERS_num %% 1) < 1e-3)
#       VERS_chr VERS_num
# 1 20220901.000 20220901
# 2 20220901.000 20220901

Explanation:

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    I should've mentioned that it was a character value. The first suggestion and explanation worked. Thank you so much! I also made a note in regards to the images and posting the code and such for future reference / questions. I appreciate your help and feedback on how to be better myself! – Dar Nov 28 '22 at 15:32