0

In my dataset, the duration of a activity is either given in hours (column duration_hours) or in minutes (column duration_minutes). If it is given in hours, the duration_minutes column is empty (NA) and vice versa.
I now want to convert the values given in minutes into hours by dividing them by 60 (minutes).

To do so I tried this command:

df <- df %>% mutate(duration_recoded = replace(duration_minutes, !is.na(duration_minutes), duration_minutes / 60))

However, the command produces incorrect results and this warning message is shown:

Warning message:
In x[list] <- values :
  number of items to replace is not a multiple of replacement length

Can anybody tell me where my mistake is?

Here's some sample data:

df <- structure(list(duration_hours = c(1, NA, 2, NA, 1), duration_minutes = c(NA, 25, NA, 30, NA)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
mpds
  • 1
  • 1
  • Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Feb 17 '20 at 13:56
  • Likely a duplicate of https://stackoverflow.com/questions/47271093/merge-multiple-variables-in-r, but not totally sure without sample data – camille Feb 17 '20 at 14:09
  • I added sample data to my original post. – mpds Feb 20 '20 at 14:13

2 Answers2

2

We can make use of the coalesce() function from the dplyr package here:

library(dplyr)
df <- df %>% mutate(duration_recoded = coalesce(duration_hours, duration_minutes / 60))

This should work because if the duration_hours be non NA, then coalesce would simply grab it and assign it to duration_recorded. If duration_hours is actually NA, then it would pass and instead take duration_minutes divided by 60.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • This is a very elegant solution. Thanks a lot! However, it would really help me to also understand what was the issue with my approach. I feel it's no different to what was suggested e.g. here https://stackoverflow.com/a/28013895/12912548 – mpds Feb 17 '20 at 14:28
  • @mpds The problem with your current approach is that you only want to replace with the duration minutes divided by 60 if the duration hours field be empty. – Tim Biegeleisen Feb 17 '20 at 16:21
  • But isn't this basically the same as in the example, where the the value is changed when the value is 4? Asked differently, is the issue with the `!is.na(duration_minutes)` or with the `duration_minutes / 60` part of the command? – mpds Feb 18 '20 at 07:26
  • No, your usage/understanding of `replace()` is incorrect, and the logic must be wrong, because you need to check for the presence of the `duration_hours` before deciding what is the replacement. – Tim Biegeleisen Feb 18 '20 at 07:34
  • Isn't this what my code does (at least what I'm trying to make it do) ? First it checks if `duration_hours`is not empty and only then it replaces it with its own value divided by 60. – mpds Feb 18 '20 at 15:13
  • `replace(duration_minutes, !is.na(duration_minutes), duration_minutes / 60)` ... I don't see `duration_hours` mentioned anywhere. – Tim Biegeleisen Feb 18 '20 at 15:14
  • Oh right, that was a mistake in my previous comment. The command does indeed not include `duration_hours`. However, there is no need for it because only either `duration_hours` (if duration is more than 60 minutes) or `duration_minutes`(if less than 60 minutes) has a value. The other one will always be `NA`. With my command I only want to recode the value of `duration_minutes`. – mpds Feb 20 '20 at 14:12
0

The problem in your code is that duration minutes is a vector and when you divide by 60 you are performing a vector operation. Let's use an example df:

# A tibble: 7 x 1
  duration_minutes
             <dbl>
1               10
2               20
3               30
4               NA
5               50
6               NA
7               60

In this case, df$duraction_minutes / 60 results in:

0.1666667 0.3333333 0.5000000        NA 0.8333333        NA 1.0000000

That means that you are trying to replace every NA value with a vector of multiple values... That is why your warning message says number of items to replace is not a multiple of replacement length.

You either have to use some function that aggregates multiple values to a single value (such as sum(), mean(), first(), etc) or you have to select a single value to act as a replacement. the coalesce() function is just finding the first non-missing element.

Adam Sampson
  • 1,971
  • 1
  • 7
  • 15
  • If my code produced the values given in your post, everything would be fine. However, if you try my sample data, there are two rows that have values for `duration_minutes` and should therefore be recoded. The first non-NA element (25) is ignored by the code and the second one (30) is recoded incorrectly (value of 0.416 instead of 0.5). – mpds Feb 18 '20 at 07:39