0

I have two panel DF the first one is Boolean and have a yearly frequece

Criteria

structure(list(Name = c("ff", "fd", "fe", "fr", "fz", "fa", "kl","ml", "az","er", "ff", "fd", "fe", "fr", "fz", "fa", "kl", "ml","az", "er"), Date =c("01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992","01/31/1992", "01/31/1992", "01/31/1993", "01/31/1993", "01/31/1993","01/31/1993", "01/31/1993","01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993"), Value = c("FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), .Names = c("Name", "Date", "Value"))

Year: change every year in the real example until 2016

the Second df are a panel table also:

       Names.       Date   Value
1      ff 01/31/1992  0.2000
2      fd 01/31/1992  0.4300
3      fe 01/31/1992      NA
4      fr 01/31/1992  0.5200
5      fz 01/31/1992 -0.0020
6      fa 01/31/1992      NA
7      kl 01/31/1992  0.0010
8      ml 01/31/1992      NA
9      az 01/31/1992  0.2200
10     er 01/31/1992  0.3200
11     ff 05/31/1992  0.0000
12     fd 05/31/1992  0.0010
13     fe 05/31/1992  0.0320
14     fr 05/31/1992  0.9123
15     fz 05/31/1992  1.0000
16     fa 05/31/1992  0.3200
17     kl 05/31/1992  0.4300
18     ml 05/31/1992  0.0312
19     az 05/31/1992  0.0312
20     er 05/31/1992  0.4300
21     ff 03/31/1993  0.5300
22     fd 03/31/1993  0.8400
23     fe 03/31/1993  0.0010
24     fr 03/31/1993 -0.0123
25     fz 03/31/1993  0.4300
26     fa 03/31/1993  0.1340
27     kl 03/31/1993  0.7400
28     ml 01/31/1993  0.0312
29     az 01/31/1993  0.9324
30     er 01/31/1993  0.0600

variable change every month until year = 2016

which Name represent the same companies Name in the "criteria" , Variable: a monthly ferequence durinig the same years in the first df , value: numeric values present the return of companies.

I would like to create a df3

Date          Portfolio1
01/31/1992     (the monthly mean of comapnies have criteria True this year) 
05/31/1992     (the monthly mean of comapnies have criteria True this year)
03/31/1993     (the monthly mean of comapnies have criteria True this year)

which Date have the same dates as df2 (monthly frequence) , ptf1 = mean monthly value from df2 of companies have the criteria True in df1 in that year.

I fund that the aggregate data.frame are very useful

  • Your code for df1 and criteria is not valid R code. – Pierre Lapointe Mar 26 '17 at 16:33
  • sorry the dput is not efficient enough for my example cause my data is too long , I attached a picture for df1 and df2 – Filius Tanit Mar 26 '17 at 16:41
  • Please make a small reproducible example with only the relevant columns. You can use `data.frame` instead of `dput`. See this post:http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Pierre Lapointe Mar 26 '17 at 16:44
  • I change it (P Lapointe) hope now the example is more reproducible – Filius Tanit Mar 26 '17 at 17:02
  • Sorry, but we can't work with your code. 1- There is no example of TRUE in df1 and 2-all values are NA in df2, we can't calculate mean. Build a small example of data and show the desired output. It doesn't have to be exactly like your real data. Please read the link above. – Pierre Lapointe Mar 26 '17 at 17:19
  • I hope it work this time . Thankyou (P Lapointe) – Filius Tanit Mar 27 '17 at 13:44

1 Answers1

0

You can do that with dplyr using left_join and summarise. I first join by Date and Name. Then I remove all rows that are not TRUE using filter. I then group_by date before calculating the mean with summarise.

library(dplyr)
Criteria <- structure(list(Name = c("ff", "fd", "fe", "fr", "fz", "fa", "kl","ml", "az","er", "ff", "fd", "fe", "fr", "fz", "fa", "kl", "ml","az", "er"),
Date =c("01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992", "01/31/1992","01/31/1992", "01/31/1992", "01/31/1993",
"01/31/1993", "01/31/1993","01/31/1993", "01/31/1993","01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993", "01/31/1993"), Value = c("FALSE", "TRUE", "FALSE",
"FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE", "TRUE", "FALSE", "FALSE", "TRUE")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), .Names = c("Name", "Date", "Value"))

df2 <- read.table(text="Name    Date    Value
ff  01/31/1992  0.2
fd  01/31/1992  0.43
fe  01/31/1992  NA
fr  01/31/1992  0.52
fz  01/31/1992  -0.002
fa  01/31/1992  NA
kl  01/31/1992  0.001
ml  01/31/1992  NA
az  01/31/1992  0.22
er  01/31/1992  0.32
ff  05/31/1992  0
fd  05/31/1992  0.001
fe  05/31/1992  0.032
fr  05/31/1992  0.9123
fz  05/31/1992  1
fa  05/31/1992  0.32
kl  05/31/1992  0.43
ml  05/31/1992  0.0312
az  05/31/1992  0.0312
er  05/31/1992  0.43
ff  03/31/1993  0.53
fd  03/31/1993  0.84
fe  03/31/1993  0.001
fr  03/31/1993  -0.0123
fz  03/31/1993  0.43
fa  03/31/1993  0.134
kl  03/31/1993  0.74
ml  01/31/1993  0.0312
az  01/31/1993  0.9324
er  01/31/1993  0.06",header=TRUE, stringsAsFactors=FALSE)

left_join(df2,Criteria, by=c("Name"="Name","Date"="Date")) %>%
dplyr::filter(Value.y==TRUE) %>%
group_by(Date)%>%
summarise(Portfolio1=mean(Value.x, na.rm = TRUE))

# A tibble: 2 x 2
        Date Portfolio1
       <chr>      <dbl>
1 01/31/1992      0.217
2 01/31/1993      0.060
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56