0

I am trying to tidy some data. I have random strings that I am trying to classify as numbers.

table <- "x count
1   k
2   2k
3   k1e
4   k2e
5   2k1e
6   k2
7   ke
8   k3e
9   1ek
10  ek
11  x2k
12  xk2e
13  xk1e
14  k2e1m
15  xk
16  1mk1e
17  k1m
18  k1e1m
19  1ek1m
20  k2m
21  1mk
22  xk1m
23  k1e2
24  1e
25  x
26  x1m
27  f1m
28  fk
29  f
30  fk1m
31  ff
32  ff2m
33  f2m
34  2m
35  1m
36  2s
37  f2k
38  ffk1m
39  f2k1m
40  kx
41  xxk
42  2k2e
43  k1m2
44  kf1m
45  1k1e
46  1k2e
47  1k1e1m
48  1k1m
49  1k2m
50  1k1d
51  2k1m
52  3k
53  kk
54  2k2m
55  2kk
56  kk1e
57  2kx
58  xk2k
59  x2k2e
60  k1e2m
61  k1mk
62  k3m
63  k2x
64  k1me
65  xk2m
66  1mfl
67  3m
68  fk2m
69  fk1e
70  fk1e1m
71  ffkk
72  xkf1m
73  ffk"


#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE)
df

Here is the nested ifelse statements:

df$new <- ifelse(grepl("k", df$old, ignore.case = T), "1",
                         ifelse(grepl("2k", df$old, ignore.case = T), "2",
                                ifelse(grepl("k1e", df$old, ignore.case = T), "1",
                                       ifelse(grepl("k2e", df$old, ignore.case = T), "1",
                                              ifelse(grepl("2k1e", df$old, ignore.case = T), "2",
                                                     ifelse(grepl("k2", df$old, ignore.case = T), "2",
                                                            ifelse(grepl("ke", df$old, ignore.case = T), "1",
                                                                   ifelse(grepl("k3e", df$old, ignore.case = T), "1",
                                                                          ifelse(grepl("1ek", df$old, ignore.case = T), "1",
                                                                                 ifelse(grepl("ek", df$old, ignore.case = T), "1",
                                                                                        ifelse(grepl("x2k", df$old, ignore.case = T), "2",
                                                                                               ifelse(grepl("xk2e", df$old, ignore.case = T), "1",
                                                                                                      ifelse(grepl("xk1e", df$old, ignore.case = T), "3",
                                                                                                             ifelse(grepl("k2e1m", df$old, ignore.case = T), "1",
                                                                                                                    ifelse(grepl("xk", df$old, ignore.case = T), "2",
                                                                                                                           ifelse(grepl("1mk1e", df$old, ignore.case = T), "1",
                                                                                                                                  ifelse(grepl("k1m", df$old, ignore.case = T), "2",
                                                                                                                                         ifelse(grepl("k1e1m", df$old, ignore.case = T), "1",
                                                                                                                                                ifelse(grepl("1ek1m", df$old, ignore.case = T), "2",
                                                                                                                                                       ifelse(grepl("k2m", df$old, ignore.case = T), "2",
                                                                                                                                                              ifelse(grepl("1mk", df$old, ignore.case = T), "1",
                                                                                                                                                                     ifelse(grepl("xk1m", df$old, ignore.case = T), "2",
                                                                                                                                                                            ifelse(grepl("k1e2", df$old, ignore.case = T), "1",
                                                                                                                                                                                   ifelse(grepl("1e", df$old, ignore.case = T), "4",
                                                                                                                                                                                          ifelse(grepl("x", df$old, ignore.case = T), "1",
                                                                                                                                                                                                 ifelse(grepl("x1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                        ifelse(grepl("f1m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                               ifelse(grepl("fk", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                      ifelse(grepl("f", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                             ifelse(grepl("fk1m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                    ifelse(grepl("ff", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                           ifelse(grepl("ff2m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                  ifelse(grepl("f2m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                         ifelse(grepl("2m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                ifelse(grepl("1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                       ifelse(grepl("2s", df$old, ignore.case = T), "6",
                                                                                                                                                                                                                                                                              ifelse(grepl("f2k", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                     ifelse(grepl("ffk1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                            ifelse(grepl("f2k1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                   ifelse(grepl("kx", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                          ifelse(grepl("xxk", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                 ifelse(grepl("2k2e", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                        ifelse(grepl("k1m2", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                               ifelse(grepl("kf1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                      ifelse(grepl("1k1e", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                             ifelse(grepl("1k2e", df$old, ignore.case = T), "5",
                                                                                                                                                                                                                                                                                                                                                    ifelse(grepl("1k1e1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                           ifelse(grepl("1k1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                  ifelse(grepl("1k2m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                         ifelse(grepl("1k1d", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                ifelse(grepl("2k1m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                       ifelse(grepl("3k", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                              ifelse(grepl("kk", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                     ifelse(grepl("2k2m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                            ifelse(grepl("2kk", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                                   ifelse(grepl("kk1e", df$old, ignore.case = T), "2", 
                                                                                                                                                                                                                                                                                                                                                                                                                          ifelse(grepl("2kx", df$old, ignore.case = T), "4",
                                                                                                                                                                                                                                                                                                                                                                                                                                 ifelse(grepl("xk2k", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                                                        ifelse(grepl("x2k2e", df$old, ignore.case = T), "2",  
                                                                                                                                                                                                                                                                                                                                                                                                                                               ifelse(grepl("k1e2m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                                                                      ifelse(grepl("k1mk", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                                                                             ifelse(grepl("k3m", df$old, ignore.case = T), "2",  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ifelse(grepl("k2x", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ifelse(grepl("k1me", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ifelse(grepl("xk2m", df$old, ignore.case = T), "2",  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         ifelse(grepl("1m(fkllenfromnestkbove)", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                ifelse(grepl("3m", df$old, ignore.case = T), "4",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       ifelse(grepl("fk2m", df$old, ignore.case = T), "2",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ifelse(grepl("fk1e", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ifelse(grepl("fk1e1m", df$old, ignore.case = T), "8",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            ifelse(grepl("ffkk", df$old, ignore.case = T), "2",  
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ifelse(grepl("xkf1m", df$old, ignore.case = T), "1",
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          ifelse(grepl("ffk", df$old, ignore.case = T), "2", "Other"
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

This results in the error:

Error: unexpected ')' in "

The ouput would look similar to this but with more rows:

table <- "x count number
1   k 1
2   2k 2
3   k1e 1
4   k2e 1
5   2k1e 2
6   k2 2
7   ke 1
8   k3e 1
9   1ek 1
10  ek 1"

No matter how many parentheses I add or remove, this error remains. I've found some threads on other sites where people had this issue, but no solutions to resolve it.

Any suggestions would be appreciated.

cgxytf
  • 421
  • 4
  • 11
  • Or `case_when` if the ordering is important. – Gregor Thomas Jan 06 '22 at 19:05
  • can you write a vector with the values you want to change? ie `c( 'k' = 1, '2k' = 2,...)` and post that? – Onyambu Jan 06 '22 at 19:06
  • 4
    This whole approach is extremely error-prone and unnecessary complicated. I suggest you'd rather create a lookup list agsint you'd match your data and replace the values. It woudl be helpful if you'd provide a reproducible example that we can work with. – deschen Jan 06 '22 at 19:06
  • 3
    There is a [limit to the number of nested ifelse() calls you can make](https://stackoverflow.com/questions/25063354/is-there-a-limit-for-the-possible-number-of-nested-ifelse-statements). I' would be better to use a different function here. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify alternative solutions. – MrFlick Jan 06 '22 at 19:09
  • 4
    I would note that your first condition checks if there is a `k` and makes the result `"1"`. All the subsequent conditions will only be checked if there is not a `k`. But many of the subsequent conditions include a `k`, so any subsequent condition that includes a `k` will not ever be `TRUE`, because they are pre-empted by the first `grepl("k")`. – Gregor Thomas Jan 06 '22 at 19:10
  • Hi everyone, the desired output would be the conditions in each line of the nested ifelse. I was under the impression that it was labelling all cases that were exactly "k", not any that contained "k", so I understand how this would be an error now. I've edited my question to show how the data should look. – cgxytf Jan 06 '22 at 19:16
  • How do you decide as to whether a string takes the value 1,2 or 3? – Onyambu Jan 06 '22 at 19:19
  • 1
    @jl748795 `grepl` is for looking *inside* strings. If you want to test if a string `x` contains `k` anywhere in it, you use `grepl("k", x)`. If you want to test if a string is exactly `"k"`, you use `x == "k"` or `"k" %in% x`. (Though I guess you get that now, but here are the alternatives.) – Gregor Thomas Jan 06 '22 at 19:29
  • Each string will receive a 0, 1, 2, or 3. But for the purpose of this reproducible example, I altered my data, so for the solution, it can be re-classified randomly. Sorry, the original data and methods can't be shared, so I made up strings and the re-classified numbers can be random. – cgxytf Jan 06 '22 at 19:29

1 Answers1

4

Build a vector with your strings (refer) and then test them with your data

refer <- setNames(c(1:10), c("k","2k","k1e","k2e","2k1e","k2","ke","k3e","1ek","ek"))
refer
   k   2k  k1e  k2e 2k1e   k2   ke  k3e  1ek   ek 
   1    2    3    4    5    6    7    8    9   10

# your data
da <- data.frame(data=c("2k","2k","k1e","k1e","2k1e","ke","ke","k3e","k3e","k1e","ed","dqw"))

cbind(da, ids=refer[da$data])
   data ids
1    2k   2
2    2k   2
3   k1e   3
4   k1e   3
5  2k1e   5
6    ke   7
7    ke   7
8   k3e   8
9   k3e   8
10  k1e   3
11   ed  NA
12  dqw  NA
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29