-1

I have a dataset with a column that outlines the amount someone drinks fruit juice, based on a survey. The recipients can respond how many times they drink juice daily, weekly, or monthly.

The column is set as a 3 digit integer, where the first number is whether they chose daily/weekly/monthly, and the remaining digits are how many times they drank juice within that period. So 104 would mean they drink juice 4 times per day. 209 would mean 9 times per week. etc.

This is the structure:

juice <- c(101,204,310)

I want to create a new column which standardizes the data, so that it's all a "per week" figure. So if the integer begins with a 1 (daily), it should multiply the second 2 digits (as a single number e.g. 04 = 4 times) by 7 and remove the "1" from the start. If it begins with 2 (weekly), just remove the first digit. If it begins with 3 (monthly), divide by 30 and multiply by 7 and remove the first digit.

I am new to R and have no idea how to approach this - any help would be greatly appreciated!

realsimont
  • 32
  • 3
  • 2
    Please provide a reproducible example to work with – Jilber Urbina May 23 '19 at 23:06
  • 1
    Please see [How to create a great reproducible example in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and update your question. What you describe is not hard to do, but it's a lot easier to help if we don't have to guess what your data looks like and/or take the time to create something that *may* resemble the actual data set. – Mako212 May 23 '19 at 23:12
  • Is it just something as simple as `x <- c(101,204,310)` ? That's all we need to know. – thelatemail May 23 '19 at 23:14
  • Thanks for the feedback - yes the structure is: `x <- c(101, 204, 310)` – realsimont May 23 '19 at 23:56

3 Answers3

2

Do it with some vectorised indexing. I've nicked the example data from @divibisan:

df <- data.frame(juice = c(104, 106, 204, 209, 302, 332, 111))

c(7,1,7/30)[df$juice %/% 100]  * df$juice %% 100
#[1] 28.0000000 42.0000000  4.0000000  9.0000000  0.4666667  7.4666667 77.0000000
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • I did not see that you had posted this – Onyambu May 24 '19 at 00:39
  • I love this solution! Definitely not the most readable, but probably significantly more efficient if you're in a situation where that matters. Also, I'm a sucker for anything that combines modulo with integer division. – divibisan May 24 '19 at 15:56
1
> dat_3digits <- data.frame(drinks = c(104,  209 , 301))
> 
> library(tidyverse)
> dat_3digits %>% 
    mutate(freq  = sub("\\d{2}$", "", drinks)%>%  as.numeric, 
           times = sub("\\d{1}", "", drinks) %>%  as.numeric,
           new_drinks = if_else(freq == 1, times * 7,
                                if_else(freq == 3, (times/30)*7, freq)))
  drinks freq times new_drinks
1    104    1     4 28.0000000
2    209    2     9  2.0000000
3    301    3     1  0.2333333

Using R base and substr instead of sub

transform(transform(dat_3digits, 
                    freq = as.numeric(substr(drinks, start=1, stop=1)),
                    drinks2 = as.numeric(substr(drinks, start=2, stop=3))),
          new_drinks = ifelse(freq == 1, drinks2 * 7,
                              if_else(freq == 3, (drinks2/30)*7, freq)))
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
0

We can do this in tidyverse by splitting the drinks figure using separate and then using case_when to multiply the count by the appropriate amount:

library(tidyverse)
df <- data.frame('juice' = c(104, 106, 204, 209, 302, 332, 111))

df %>%
    separate(juice, into = c('period', 'drinks'), sep = 1) %>% # split after 1st character
    mutate(
        drinks = as.numeric(drinks), # convert number of drinks to numeric
        dpw = case_when(             # then multiply based on the value of the first period
            period == 1 ~ drinks * 7,
            period == 2 ~ drinks,
            period == 3 ~ (drinks / 30) * 7 ))

  period drinks        dpw
1      1      4 28.0000000
2      1      6 42.0000000
3      2      4  4.0000000
4      2      9  9.0000000
5      3      2  0.4666667
6      3     32  7.4666667
7      1     11 77.0000000
divibisan
  • 11,659
  • 11
  • 40
  • 58