-2

I'm new to the R language and I'm having some difficult to calculate the returns of my dataset for every Identification.

I have a very large dataset of monthly observations grouped like so:

Code      Subset   Identification   Names        Times       Value      %   
100       1001        10011         .....        201012       10        40 
100       1001        10012         .....        201012       11        60 
100       1002        10021         .....        201012        7        30 
100       1002        10022         .....        201012       13        70 
..... 
100       1001        10011        .....         201301       11        45 
100       1001        10012        .....         201301       15        55 
100       1002        10021        .....         201301        9        33 
100       1002        10022        .....         201301       17        67 

I need to write a function that can calculate the monthly rate of returns for every Identification. Then, I need to aggregate the values so calculated in the upper level of "subset" (with a mean weighted "%").

I've changed the format of the vector times to year-month i.e. "%Y-%m" in this way: as.yearmon(as.character(Data$Times), format = "%Y%m")

and I've tried to calculate the returns for every Identification using split and sapply, like this: xm <- split(Data, Identification) Retxm <- sapply(1:length(xm), function(x) returns(Value))

The output i had using the function above is like this:

        [,1]          [,2]          [,3]          [,4]          
[1,]            NA            NA            NA            NA        
[2,]  1.605198e-03  1.605198e-03  1.605198e-03  1.605198e-03 
[3,] -1.190902e-02 -1.190902e-02 -1.190902e-02 -1.190902e-02 
[4,]  3.318032e-03  3.318032e-03  3.318032e-03  3.318032e-03 

The output is not many clear, so i would have on the row the Times and on the header the Identification.

Thank you so much!

  • 3
    http://stackoverflow.com/questions/11562656/averaging-column-values-for-specific-sections-of-data-corresponding-to-other-col/11562850#11562850 – Ari B. Friedman Oct 14 '13 at 12:50
  • 5
    Stack Overflow is not a code generator. Please show what you have tried. – Roland Oct 14 '13 at 12:50
  • 1
    @Roland idea alert: Random R code generator! Lorem ipsum for R if you will. – Roman Luštrik Oct 14 '13 at 13:56
  • @RomanLuštrik But it has to generate code that doesn't return an error. ;) – Roland Oct 14 '13 at 14:19
  • [This post](http://stackoverflow.com/questions/16657512/apply-function-conditionally/16657546#16657546) also offers a starting point . – Jilber Urbina Oct 14 '13 at 14:37
  • 1
    I think questioners should demonstrate that they have done some searching before we go out and attempt to augment their efforts. – IRTFM Oct 14 '13 at 16:29
  • I'm sorry but i've two preliminary problem to write the R' code, i tried in this way: 1) i've some problems to set up the times vector with the format "%Y-%m": as.character(format(attr(dati, "Times"), "%Y-%m")); 2) i don't know how i can write the function to calculate the returns because i need to calculate them for every "Identification". The only way i've found to set up the problem is with the "split", like this: split(c(dati$Value),as.factor(dati$Identification)) I hope that i've better explained the problem. Thaks to everyone! – Refrattale Oct 14 '13 at 20:22

1 Answers1

0

Here's a minimal dataset which is similar:

set.seed(1)
df1 <- data.frame(id=sample(c("10011", "10012", "10013"), 6, replace=TRUE),
                  d1=rep(c(201012, 201101), each=3),
                  v1=ceiling(20*runif(6))
                  )

As to your first question, you can't format an object as Date in base R unless you specify the day in addition to the month and year. To handle dates which are specified by month & year you could use:

library(zoo)
df1$d1 <- as.yearmon(as.character(df1$d1), format="%Y%m")

As to the second part of the question it's unclear to me what sort of calculation you're trying to perform. Following your methods you can indeed split the data.frame and do something with each element e.g. get the sum of the elements in the v1 column:

l1 <- split(df1, df1$id)
sapply(1:length(l1), function(i) sum(l1[[i]]$v1))

Edit My Java's not working so can't add comment. Still not clear what you're trying to do. Would be better if you could spell it out with a working example; try editing your original question if able to do so.

dardisco
  • 5,086
  • 2
  • 39
  • 54
  • Thank's for all the advices! the calculation i'm trying to do is like this: Value(t)-Value(t-1)/Value(t-1) to calculate the monthly returns for every Identification. Afther this i need to aggregate the montly returns for every Subset. Returns of the subset are calculated weigthing the Identification returns for the column "%". – Refrattale Oct 15 '13 at 07:43