Group data into new column value based of condition

Question

I have data like below:

Caller  Date    Duration    Status
304 2/1/2016    756 ANSWERED
304 2/1/2016    61  ANSWERED
304 2/4/2016    60  ANSWERED
304 2/10/2016   61  ANSWERED
304 2/17/2016   60  ANSWERED
304 2/19/2016   30  ANSWERED
304 2/24/2016   27  ANSWERED
304 2/28/2016   55  ANSWERED
304 2/28/2016   63  ANSWERED

I want to group the data in R, based on week, i.e if hte date lies between 2/1/2017 and 2/7/2017 I add a new column called "week" and place the value as Week 1 for those tuples. similarly for all other weeks in month.

The output would look as such

Caller  Date    Duration    Status Week
304 2/1/2016    756 ANSWERED   Week 1
304 2/1/2016    61  ANSWERED   Week 1
304 2/4/2016    60  ANSWERED   Week 1
304 2/10/2016   61  ANSWERED   Week 2
304 2/17/2016   60  ANSWERED   Week 2
304 2/19/2016   30  ANSWERED   Week 3
304 2/24/2016   27  ANSWERED   Week 4
304 2/28/2016   55  ANSWERED   Week 4
304 2/28/2016   63  ANSWERED   Week 4

Please suggest me a method in R. thanks

`dput(df)` outputs a plain text representation of your R object `df`. It is a good practice to include the output of `dput` in the questions so that it would be easy for us to take your code and work with it. Check this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and `?dput` — Sumedh, Jul 07 '16 at 20:05

score 1 · Answer 1 · answered Jul 07 '16 at 20:09

One way to do this would be to use lubridate and dplyr

Suppose your data is in a data frame called dat:

library(lubridate)
library(dplyr)
dat$Date <- mdy(dat$Date)
t0 <- dat[1, 2]
dat %>% mutate(Week = paste('Week', as.integer(Date - t0) / 7) + 1))

Result:

Caller       Date Duration   Status   Week
1    304 2016-02-01      756 ANSWERED Week 1
2    304 2016-02-01       61 ANSWERED Week 1
3    304 2016-02-04       60 ANSWERED Week 1
4    304 2016-02-10       61 ANSWERED Week 2
5    304 2016-02-17       60 ANSWERED Week 3
6    304 2016-02-19       30 ANSWERED Week 3
7    304 2016-02-24       27 ANSWERED Week 4
8    304 2016-02-28       55 ANSWERED Week 4
9    304 2016-02-28       63 ANSWERED Week 4

You can also use lubridate's `week` or `isoweek` to calculate the weeks: `df %>% mutate(Date = lubridate::mdy(Date), Week = lubridate::isoweek(Date), Week = paste('Week', Week - min(Week) + 1))` — alistaire, Jul 07 '16 at 20:30

score 1 · Answer 2 · answered Jul 07 '16 at 20:13

You can pull the week of the year directly with

format(as.Date("2016-07-01"), format = "Week %U")

See the help for strptime for more details on the formatting. Note, for example, that it only gives week of the year -- so 2017-01-01 will be before anything in 2016. You could write a wrapper similar to @ManishGoel's answer that would set your starting point as week 1.

A more generic solution is to use cut:

mycuts <- seq(as.Date("2016-01-01"), as.Date("2017-12-30"), 7 )
cut(as.Date("2016-07-01"), mycuts, labels = 1:(length(mycuts)-1))

That may be easier to scale for your needs, and applies more broadly to other classes of problems. If you really need the "Week" in there, you can do that directly too:

cut(as.Date("2016-07-01"), mycuts, labels = paste("Week", 1:(length(mycuts)-1)))

score 0 · Answer 3 · answered Jul 07 '16 at 20:05

0

You can extract the day using strsplit and then calculate the week from the date.

Week <- sapply(df$Date, FUN = function(x){
  day <- as.numeric(strsplit(as.character(x),"/")[[1]]2]);
  return(as.integer(day/7)+1)
})
df$Week <- Week

Though, you need to give more information regarding how the dates are distributed cause calculation of week number depends on that.

answered Jul 07 '16 at 20:05

Manish Goel

843
1
8
21

can't you directly split the date column itself based on a condition? – Sai Pardhu Jul 07 '16 at 20:11

Group data into new column value based of condition

3 Answers3