0

I'll start by saying I'm trying to learn r but it doesn't come easy to me. Similar to this post here I am trying to match values in multiple columns from one data frame (df) and then replace those values based on the corresponding columns from the other data frame (df.key). Here is my example df:

name  type place ttotal t01 t02 t03 t04 t05 t06 t07 t08 t09
joe   cat  SE        7    3   2   2   3   2  5   2   0  1  
john  cat  SE        2    0   0   4   0   3  1   3   1  7
sue   cat  SE        1    2   0   5   0   4  1   4   3  0     
jack  cat  SE        6    3   4   2   2   4  0   2   1  5    

Below is my df.key to be used to match the values above in columns df$ttotal to t09 with df.key$class and replace with the values in df.key$mid accordingly:

lo  hi class mid 
0    0    0  0.0
0    1    1  0.5
1    2    2  3.0    
5   10    3  7.5   
10  20    4 15.0 
20  30    5 25.0 
30  40    6 35.0 
40  50    7 45.0 

so the first row should be:

name  type place ttotal t01  t02  t03 t04 t05 t06  t07 t08 t09
 joe   cat  SE   45.0   7.5  3.0  3.0 7.5 3.0 25.0 3.0 0.0 0.5

Here is just one match loop I tried but it populates the came value across the row:

for(i in 1:dim(df)[1]){
  for(j in df$4:13) {
    df[i,j] <- df.key$mid[match(i, df.key$class)]
  }
}

Thanks for the help. I'd like to try to get a solution somewhat similar to this in hopes I can understand it.

KNN
  • 459
  • 4
  • 19

2 Answers2

0

Could do:

library(tidyverse)

df %>%
  gather(key, val, ttotal:t09) %>%
  left_join(df.key %>% select(3:4), by = c("val" = "class")) %>%
  spread(key, mid) %>%
  group_by(name) %>%
  summarise_all(funs(first(na.omit(.)))) %>%
  select(-val)

Output:

# A tibble: 4 x 13
  name  type  place   t01   t02   t03   t04   t05   t06   t07   t08   t09 ttotal
  <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 jack  cat   SE      7.5    15     3   3    15     0     3     0.5  25     35  
2 joe   cat   SE      7.5     3     3   7.5   3    25     3     0     0.5   45  
3 john  cat   SE      0       0    15   0     7.5   0.5   7.5   0.5  45      3  
4 sue   cat   SE      3       0    25   0    15     0.5  15     7.5   0      0.5
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
  • How can I modify this to drop cells with NA? I had to convert the df using as.numeric (some numbers were characters) but found empty cells. – KNN Feb 10 '19 at 19:12
  • Not sure what you mean, after applying the code below all relevant cells should be numeric? Meaning `t01` until `ttotal`. – arg0naut91 Feb 10 '19 at 19:19
  • 1
    I figured out the problem - I had missing data. Thanks – KNN Feb 11 '19 at 03:15
0

You can simply map your keys into your data:


library(tidyverse)

mutate_at(dat, vars(ttotal:t09), funs(map_dbl(., ~ keys$mid[keys$class == .x])))

Which outputs:

  name type place ttotal t01 t02 t03 t04  t05  t06  t07 t08  t09
1  joe  cat    SE   45.0 7.5   3   3 7.5  3.0 25.0  3.0 0.0  0.5
2 john  cat    SE    3.0 0.0   0  15 0.0  7.5  0.5  7.5 0.5 45.0
3  sue  cat    SE    0.5 3.0   0  25 0.0 15.0  0.5 15.0 7.5  0.0
4 jack  cat    SE   35.0 7.5  15   3 3.0 15.0  0.0  3.0 0.5 25.0

Explanation:

With dplyr::mutate_at() you can change the values of variables you select with vars(ttotal:t09), applying the function funs(...) to each of selected variables. For each variable map_dbl(., ~ keys$mid[keys$class == .x]) compares it with keys$class element-wise (key$class == .x), and subsets keys$mid by resulting Boolean vector.


Your data:

dat <-
  structure(
    list(
      name = c("joe", "john", "sue", "jack"),
      type = c("cat",
               "cat", "cat", "cat"),
      place = c("SE", "SE", "SE", "SE"),
      ttotal = c(7L,
                 2L, 1L, 6L),
      t01 = c(3L, 0L, 2L, 3L),
      t02 = c(2L, 0L, 0L, 4L),
      t03 = c(2L, 4L, 5L, 2L),
      t04 = c(3L, 0L, 0L, 2L),
      t05 = c(2L,
              3L, 4L, 4L),
      t06 = c(5L, 1L, 1L, 0L),
      t07 = c(2L, 3L, 4L,
              2L),
      t08 = c(0L, 1L, 3L, 1L),
      t09 = c(1L, 7L, 0L, 5L)
    ),
    class = "data.frame",
    row.names = c(NA,-4L)
  )

keys <-
  structure(
    list(
      lo = c(0L, 0L, 1L, 5L, 10L, 20L, 30L, 40L),
      hi = c(0L,
             1L, 2L, 10L, 20L, 30L, 40L, 50L),
      class = 0:7,
      mid = c(0, 0.5,
              3, 7.5, 15, 25, 35, 45)
    ),
    class = "data.frame",
    row.names = c(NA,-8L)
  )
utubun
  • 4,400
  • 1
  • 14
  • 17