0

Apologies if this has been answered elsewhere, I did look, but couldn't find an example I could replicate.

If I had the following data frame called DF where 1-14 are items with a score of 1, 0 or 3

Id        Date         1   2   3   4   5   6   7   8   9   10   11   12   13   14
             
1        01/01/01      1   0   3   3   1   0   1   3   1   0    3    0    1    1  
2        01/02/01      0   3   1   1   0   1   1   1   1   3    1    1    1    3

How would I create a column which averages items 1-7 for each ID excluding 3 or 0 scores (so just 1 values), and then another column doing the same for for 8-14?

So I would have this:

Id        Date         1   2   3   4   5   6   7   8   9   10   11   12   13   14   av1-7   av8-14
             
1        01/01/01      1   0   3   3   1   0   1   3   1   0    3    0    1    1    0.428   0.428
2        01/02/01      0   3   1   1   0   1   1   1   1   3    1    1    1    3    0.57    0.71

If anyone can help, it would be most appreciated.

  • A.) it will be easier for us to help, if you provide a reproducible example. This avoids us to paste and copy things and fiddle around your your data set. `dput()` is a magic functio for this. But read up on minimal reproducible examples. B.) you have laid out your programming routine. Try to code it. If you bring your data into a vertical format, you can easily create a new column by `df$new_col <- my operation` or with `mutate(newcol = my operation). You can construct the condition you want or just sum the 1 values, then average. – Ray Jun 09 '21 at 18:07

2 Answers2

1

Here's an approach with dplyr:

data %>%
  rowwise() %>%
  mutate(`av1-7` = mean(recode(c_across(`1`:`7`),`1`= 1, .default = 0)),
         `av8-14` = mean(recode(c_across(`8`:`14`),`1`= 1, .default = 0)))
# Rowwise: 
     Id Date       `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`  `14` `av1-7` `av8-14`
  <int> <chr>    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>   <dbl>    <dbl>
1     1 01/01/01     1     0     3     3     1     0     1     3     1     0     3     0     1     1   0.429    0.429
2     2 01/02/01     0     3     1     1     0     1     1     1     1     3     1     1     1     3   0.571    0.714

In general, it's not a great idea to have column names as numbers or to contain -. So it's probably better to rename those columns.

Data:

data <- structure(list(Id = 1:2, Date = c("01/01/01", "01/02/01"), `1` = 1:0, 
    `2` = c(0L, 3L), `3` = c(3L, 1L), `4` = c(3L, 1L), `5` = 1:0, 
    `6` = 0:1, `7` = c(1L, 1L), `8` = c(3L, 1L), `9` = c(1L, 
    1L), `10` = c(0L, 3L), `11` = c(3L, 1L), `12` = 0:1, `13` = c(1L, 
    1L), `14` = c(1L, 3L)), class = "data.frame", row.names = c(NA, 
-2L))
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • If I am following correctly, your answer creates the column av1-7 by recoding the values in 1:7. How does the code choose only values of 1? Is it through the `1`= 1, .default = 0 part? – user15950867 Jun 09 '21 at 18:16
  • Yes, you are correct. 1 is recoded as 1 and everything else gets 0. – Ian Campbell Jun 09 '21 at 18:18
  • 1
    I added data<-data %>% to add them as new columns in the DF. It says not to offer thanks in the comments, but I am very grateful for your help. – user15950867 Jun 09 '21 at 18:32
  • In the second part of your code (renaming), what are `1` = 1:0, `2` = c(0L, 3L), `3` = c(3L, 1L) doing? I understand it is renaming the columns, but what is the difference between the three operations for 1, 2 and 3? – user15950867 Jun 09 '21 at 18:38
  • Okay, I meant in the part where you have renamed the columns, what do 1 = 1:0, 2 = c(0L, 3L), 3 = c(3L, 1L) mean? – user15950867 Jun 09 '21 at 18:47
  • Ah, sorry. I included that as reproducible data for other people trying my approach. I used a customized function to read your data from your post. I then did the reverse using the base R function `dput`. That is the output of `dput`. In the future, when you ask questions, you should post the output of `dput(data)` or `dput(head(data))` if your data is very large to make it easier to answer. See [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/) for more. – Ian Campbell Jun 09 '21 at 18:50
1

We can use rowMeans by selecting the columns of interest

df1 <- df1 %>%
    mutate(across(`1`:`14`, ~ replace(., . != 1, 0))) %>%
   transmute(`av1-7` = rowMeans(select(cur_data(), `1`:`7`), na.rm = TRUE),
           `av8-14`= rowMeans(select(cur_data(), `8`:`14`), na.rm = TRUE)) %>%
    bind_cols(df1, .) %>%
    as_tibble

-output

df1
# A tibble: 2 x 18
     Id Date       `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`  `14` `av1-7` `av8-14`
  <int> <chr>    <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>   <dbl>    <dbl>
1     1 01/01/01     1     0     3     3     1     0     1     3     1     0     3     0     1     1   0.429    0.429
2     2 01/02/01     0     3     1     1     0     1     1     1     1     3     1     1     1     3   0.571    0.714

data

df1 <- structure(list(Id = 1:2, Date = c("01/01/01", "01/02/01"), `1` = 1:0, 
    `2` = c(0L, 3L), `3` = c(3L, 1L), `4` = c(3L, 1L), `5` = 1:0, 
    `6` = 0:1, `7` = c(1L, 1L), `8` = c(3L, 1L), `9` = c(1L, 
    1L), `10` = c(0L, 3L), `11` = c(3L, 1L), `12` = 0:1, `13` = c(1L, 
    1L), `14` = c(1L, 3L)), class = "data.frame", row.names = c(NA, 
-2L))
akrun
  • 874,273
  • 37
  • 540
  • 662