-1

Sample

 df <- data.frame(
  Birth_Date = c("1952-03-21", "1963-12-20", "1956-02-25", "1974-08-04", "1963-06-13", "1956-11-20", "1974-03-07", "1963-10-23", "1952-11-24", "1974-12-16"),
  Items_Amount = c(68,189,69,19,299,79,149,149,29,189)
  )
df

I'm trying to analyse a data-set, which has column Item_Amount(in $) and customer's birth-date spread across 90 years. Goal is to compare the sales percentage based on suitable age groups.

The main data frame contains date "BirthDate" column from "1902-02-13" to "1991-12-11" as dates not string

'data.frame':   350241 obs. of  1 variable:
$ BirthDate: Date, format: "1964-06-08" "1964-06-08" "1964-06-08" "1964-06-08" ...


>  min(Trans_Cust$Birth_Date)
[1] "1902-02-13"

> difftime(max(Trans_Cust$Birth_Date),min(Trans_Cust$Birth_Date),units = "auto")
Time difference of 32808 days

> max(Trans_Cust$Birth_Date)
[1] "1991-12-11"

How Do I find the present ages based on "Birth_Date" column, store it to new column "Present_ages" and then proceed with calculating sum(Items_Amount) grouped by present_ages.

d.b
  • 32,245
  • 6
  • 36
  • 77
Ashish25
  • 1,965
  • 1
  • 15
  • 16
  • 1
    Make sure you provide [reproducible examples](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when asking for help. What exactly is the desired output here? What do you want to do with these decades? – MrFlick Mar 06 '17 at 21:15
  • @MrFlick Details added – Ashish25 Mar 06 '17 at 21:28
  • Your modified question refers to agegroup01 - agegroup09 but your description of the data says that your data spans 10 decades. – G5W Mar 06 '17 at 21:40
  • I assuming agegroup01 as 1st quantile of decade 1 from "1902-02-13" to "1912-02-12" so that "1902-02-13" to "1991-12-11" will have somewhat 9 quantiles – Ashish25 Mar 06 '17 at 21:44
  • sorry for being ambiguous. – Ashish25 Mar 06 '17 at 21:47
  • 1
    Don't post pictures of data. See how to create a reproducible dataset in the question itself at the link i previously provided. – MrFlick Mar 06 '17 at 21:53

2 Answers2

0

I am assuming that your birth dates are just strings, so you need to convert them to some form of date. I am using POSIXct. Once converted, you can just set up the decade boundaries and use cut to break the dates into groups.

BirthDate = c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08",
         "1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24")
StartDecade = seq(as.POSIXct("1900-01-01"), as.POSIXct("2000-01-01"), by="10 years")
cut(as.POSIXct(BirthDate), breaks=StartDecade)
[1] 1960-01-01 1960-01-01 1960-01-01 1960-01-01 1900-01-01 1990-01-01 1940-01-01 1920-01-01

It may be prettier to simplify the names

as.numeric(cut(as.POSIXct(BirthDate), breaks=StartDecade)) - 1
[1] 6 6 6 6 0 9 4 2
G5W
  • 36,531
  • 10
  • 47
  • 80
0

This will return a numeric value "rounded" to the decade:

BirthDate = as.Date(c("1964-06-08", "1964-06-08", "1964-06-08", "1964-06-08", "1902-02-13", "1991-12-11", "1944-06-06", "1929-10-24"))

 BDdecade <- round( as.numeric( format(BirthDate, "%Y"))-5, -1)
 BDdecade
#[1] 1960 1960 1960 1960 1900 1990 1940 1920

Needed to extract the year, convert to numeric and subtract 5, since the floor function does not have the same capacity for rounding to tens and hundreds as does round.

It wasn't clear what your desired starting point for the "decades was supposed to be. This would split on the basis of the minimum date.

> BDdecade2 <- cut(BirthDate, breaks= seq( min(BirthDate), max(BirthDate), by= "10 years"))
> BDdecade2
[1] 1962-02-13 1962-02-13 1962-02-13 1962-02-13 1902-02-13 <NA>       1942-02-13
[8] 1922-02-13
8 Levels: 1902-02-13 1912-02-13 1922-02-13 1932-02-13 1942-02-13 ... 1972-02-13

The NA suggest you might need to add +365 (or perhaps even more) to the max date.

IRTFM
  • 258,963
  • 21
  • 364
  • 487