0

Kia ora data science community, I'm struggling to get an ifelse statement to work when trying to revise the contents of a data frame factor. I'm working with Trap Types of 5 different types, but two of the trap types aren't being summarized correctly. Here's the summary table of the trap types and number of observations associated with each type:

 DOC150 Double (Fiordland)        DOC150 Single (ATBT) 
                     107748                       20260 
      DOC150 Single (ATBT)  DOC200 Double (Run Through) 
                        456                        2324 
     DOC200 Double (Takaka)         DOC200 Double (ZIP) 
                      23748                        2472 
     DOC200 Single (Takaka)     DOC200 Single (Takaka)  
                      11258                       23668

I need DOC150 Single (ATBT) traps to be recognized as the same and summarized as such, with the same being true for DOC200 Single (Takaka). For whatever reason, the trap types are being summarized into individual categories; I suspect that when the information was pulled from the larger dataset that there was something wrong with the spacing of the names.

I've tried using the following code to reclassify one of the errant Trap Types, but to no avail: the categories remain, but the code changes all of the Trap Types from a character factor into a numeric factor and the final tally for each category remains unchanged.

Records2$TrapName<- as.character(ifelse(grepl("Single (Takaka)", Records2$TrapTypeTe), "DOC200 Single (Takaka)", Records2$TrapTypeTe))

Here's the resulting summary table:

     1      2      3      4      5      6      7      8 
107748  20260    456   2324  23748   2472  11258  23668

I thought I finally understood how to use grepl in ifelse statements, but now I'm stuck. I know how to do this in SAS, but R has thrown me for a loop. Any help would be greatly appreciated. Kia pai to ra, Doug

slava-kohut
  • 4,203
  • 1
  • 7
  • 24
Doug Robinson
  • 49
  • 1
  • 6
  • 1
    See if there are some whitespaces which differentiates them. Try `Records2$TrapName <- trimws(df$Records2$TrapTypeTe)` – Ronak Shah Jul 15 '20 at 14:05
  • What is **Kia ora data science community**?? :) – Sotos Jul 15 '20 at 14:06
  • Kia ora = Hello in Maori...I'll guess that you can figure out the 'data science community' bit... ;-) – Doug Robinson Jul 15 '20 at 14:07
  • This is one of the times when using factors makes things much easier since you immediately know what typos/capitalization issues you have. If `TrapeTypeTe` is a factor, what are the levels? It is easy to change a label and that automatically takes care of the problem. – dcarlson Jul 15 '20 at 14:07
  • Ronak Shah, trimws didn't help; categories still summarized the same way. – Doug Robinson Jul 15 '20 at 14:08
  • Can you try to give a reproducible example then? – Ronak Shah Jul 15 '20 at 14:09
  • dcarlson, Sorry, I don't quite get your comment. Would my levels be the names of the different traps? I apologize for my ignorance. I'm doing my best. – Doug Robinson Jul 15 '20 at 14:10
  • If I run trimws, is there a summary or output I should view to see whether the spaces have been removed? I ran trimws(Records2$TrapTypeTe) and got the same table results as in OP. Should I do something differently? – Doug Robinson Jul 15 '20 at 14:13
  • I've added an example below. – dcarlson Jul 15 '20 at 14:17

2 Answers2

1

Here is an approach using factors - we accidentally include some lower case letters in our codes:

x <- c("D", "B", "E", "e", "A", "a", "E", "E", "E", "D", "E", "D", 
"d", "A", "A", "b", "D", "D", "B", "C", "e", "b", "D", "d", "D")
table(x)
x
# a A b B C d D e E 
# 1 3 2 2 1 2 7 2 5 
x <- factor(x)
levels(x)
# [1] "a" "A" "b" "B" "C" "d" "D" "e" "E"
levels(x) <- c("A", "A", "B", "B", "C", "D", "D", "E", "E")
table(x)
# x
# A B C D E 
# 4 4 1 9 7 
levels(x)
# [1] "A" "B" "C" "D" "E"
dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • dcarlson...I get it. I ran levels(Records2$TrapTypeTe) and I see that the two badly behaving levels have extra spaces at the end of their values. Ronak Shah, I ran your revised code (creating the new factor) and that solved the issue. Heaps of thanks, mates! Much appreciated. – Doug Robinson Jul 15 '20 at 14:28
1

As mentioned in comments the issue was because of additional space in the column values. You can remove this with trimws and would not require ifelse or grepl.

Records2$TrapTypeTe <- trimws(Records2$TrapTypeTe)
#Check
table(Records2$TrapTypeTe)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213