I am new to R and need help with the following task. In the table below is a dummy example of data. I am struggling with writing a script that should change the price, if only one price is different (for a particular ppcode, in this example is 4th) and only one symbol in that price differs compared with the majority, to the same number as other prices. In this example, 1.42 should be changed to 1.45. But also if instead of 1.42 would be, for example, 1.55 it also should be changed to 1.45, or when 2.45 to 1.45 (in all cases when only one digit of a price differs).
Thanks in advance for any suggestions.
Asked
Active
Viewed 98 times
2

Bambeil
- 309
- 1
- 8
-
4Can you post sample data in `dput` format? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (`df` is a placeholder for the name of your dataset.) – Rui Barradas Oct 19 '21 at 18:38
-
1Are you looking for a majority algorithm by groups of `PPCODE`? – Rui Barradas Oct 19 '21 at 18:38
-
@akrun I mean that any number if only one sign in the number differs. These two are only examples., but also 1.43 etc, etc – Bambeil Oct 19 '21 at 18:43
-
Not clear based on your description or the image showed. When you say sign, did you meant `+/-` – akrun Oct 19 '21 at 18:45
-
1@Rui Barradas Actually yes, but at the beginning, I want to try on only one ppcode (showed in question) and later adapt for all. – Bambeil Oct 19 '21 at 18:45
-
Do you think you should also check for transposed characters, e.g., 1.45 vs. 1.54? – jblood94 Oct 19 '21 at 19:33
3 Answers
2
If we need the Mode
value, an option with dplyr
is
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
library(dplyr)
df1 <- df1 %>%
group_by(PPCODE, grp = sprintf('%.1f', PRICE)) %>%
mutate(PRICE = Mode(PRICE)) %>%
ungroup %>%
select(-grp)

akrun
- 874,273
- 37
- 540
- 662
-
Thanks, but as I mentioned in the question it should be applied a) only if only 1 number differs in the group of same ppcode b) only if only one digit differs in that price, (as in the example 1.42 and 1.45 (last decimal differs), but if 2 digits would be different price shouldn't be changed (for example if one different price would be 1.52 and others 1.45 then it shouldn't be changed to mode) @akrun . So, my main struggle is how to handle that only one digit in the decimal number differs (because in such cases it's treated as a typo in data and changed) – Bambeil Oct 19 '21 at 18:57
-
-
this update is helpful, but it seems that it considers only the second digit after dot. Like it didn't make a change if we have 2.45, but in such a case, it also should. Or if we have 1.75 also (only the second digit differs out of 3), but if there are two different digits in the number - then no (for example, if we have 2.35 or 1.32, etc. they should stay as they are). – Bambeil Oct 20 '21 at 09:01
1
Here is a base R way with ave
.
with(df1, ave(PRICE, PPCODE, FUN = \(x) x[which.max(table(x))]))
#[1] 1.45 1.45 1.45 1.45
And just assign the result back to PRICE
.
df1$PRICE <- with(df1, ave(PRICE, PPCODE, FUN = \(x) x[which.max(table(x))]))

Rui Barradas
- 70,273
- 8
- 34
- 66
1
Here is a dplyr
way:
library(dplyr)
df %>%
group_by(PRICE) %>%
mutate(helper = n()) %>%
ungroup() %>%
group_by(PPCODE) %>%
mutate(PRICE = ifelse(helper == unique(1), first(PRICE), PRICE), .keep="unused")
output:
OUTLETID CAT PPCODE PRICE
<chr> <chr> <int> <dbl>
1 8900NS2871 AIR 46239679 1.45
2 8900NX2201 AIR 46239679 1.45
3 8900NK2202 AIR 46239679 1.45
4 8900NV1594 AIR 46239679 1.45

TarJae
- 72,363
- 6
- 19
- 66
-
It's helpful, but here you only considered that one value differs, but the value should be changed only in the case if only one sign/digit in the number differs. For example, 1.42 should be changed into 1.45, 2.45 also (because only the first digit differs), 1.75 also to 1.45, but if we have 2.15( two digits differ from the majority) it shouldn't be changed because in such case it isn't considered a typo, but your code still changes in such cases. So, maybe you have an idea how to consider this condition too. @TarJae – Bambeil Oct 20 '21 at 08:46