1

I have a dataframe with company quarterly data, and I have two questions:

1: How can I keep records for only those companies with four quarters of data (as companies sometimes appear with 1, 2 or 3 quarters of data but I need 4 quarters for each company across the entire dataframe)

2: Because I have quarterly data, I would like to take the annual average or sum (based on variable type) across all four quarters given 2 conditions: year and company.

For instance, Company i in 1984 would have an average inventory value and total revenue, say I1984 and REV1984 respectively, based on four distinct quarterly values. I am currently using these lines of code - for the mean and sum - but R keeps returning "NA" and I've searched and searched for alternatives but nothing seems to work:

Company i_ I1984 <- with(R, mean(I [FY == "1984" & Co == "AAR CORP" ]))
Company i_ REV1984 <- with(R, sum(REVQ [FY == "1984" & Co == "AAR CORP" ]))

R is my dataframe I <- quarterly inventory REVQ <- quarterly revenue

Clearly, the values in quotes will be made dynamic as I find each new average/sum value and place in a new data.frame.

Any help would be highly appreciated. Thank you

I've included an example code below:

company<-c("xray", "xray", "xray",  "xray", "foxrot", "foxrot", "delta",     "kilo", "kilo" )  
qtr <-c("1","2","3","4", "1", "2","4", "2","3")

IQ <- rnorm(9,0,10)  
REVQ <- rnorm(9,0,10)  
AssetQ <- rnorm(9,0,10)  
CashQ  <- rnorm(9,0,10)  

#Modified dataframe  
data<-data.frame( company, qtr, IQ, REVQ, AssetQ, CashQ )

In this example 'xray' should be the only company for which we take a mean/sum.

  • 2
    You have two different questions; you should ask them separately. You should also have a look at [how to make a great reproducible example](http://stackoverflow.com/q/5963269/903061) - snapshots of data are almost useless, but a small representative few rows/columns shared with `dput()` is priceless – Gregor Thomas Jun 29 '16 at 23:11
  • 2
    So edit this question down to just one question, ask a new question for the other question, and share some data reproducibly in both! (That said, the `NA` is probably due to missing values in your data, so when the first question is answered you may not need the second question at all.) – Gregor Thomas Jun 29 '16 at 23:14
  • I'm re-posting as separate questions. Thanks – Mohamed A. Desoky Jun 30 '16 at 03:42
  • I have re-posted the first part of this question here: stackoverflow.com/q/38113824/4318393 – thank you! – Mohamed A. Desoky Jun 30 '16 at 04:28

1 Answers1

0

For your first question (with your df structure from the comments below):

company<-c("xray", "xray", "xray", "xray", "foxrot", "foxrot", "delta", "kilo", "kilo" )  
qtr <-c("1","2","3","4", "1", "2","4", "2","3")  
IQ <- rnorm(9,0,10)  
REVQ <- rnorm(9,0,10)  
AssetQ <- rnorm(9,0,10)  
CashQ <- rnorm(9,0,10)  
#Modified dataframe  
data<-data.frame(company,qtr, IQ, REVQ, AssetQ, CashQ )


#Using the dplyr package:  
data.complete<-data.frame(data %>% group_by(company) %>% filter(n() == 4))

For your second question

#Get your sum and means (note that the 'by' command will separate the sums based on the company factor when you have more than 1 company with complete data)  
aggregate(data.complete[,3:6], by=list(data.complete$company), sum)
aggregate(data.complete[,3:6], by=list(data.complete$company), mean)
doc_owl
  • 37
  • 5
  • We're close but your example dataframe is slightly different than mine, I've modified your example code just so we're on the same page with what I'm seeing/working with: company<-c("xray", "xray", "xray", "xray", "foxrot", "foxrot", "delta", "kilo", "kilo" ) qtr <-c("1","2","3","4", "1", "2","4", "2","3") IQ <- rnorm(9,0,10) REVQ <- rnorm(9,0,10) AssetQ <- rnorm(9,0,10) CashQ <- rnorm(9,0,10) #Modified dataframe data<-data.frame(company,qtr, IQ, REVQ, AssetQ, CashQ ) – Mohamed A. Desoky Jun 30 '16 at 02:27
  • Notice how there is a non-uniform sequence of "1-2-3-4" quarters for each company (ie. xray is the only company with all four quarters) and that this dataframe has no "NA" entries (I've already removed those at this stage of data clean up) – Mohamed A. Desoky Jun 30 '16 at 02:32
  • With this new dataframe, "xray" would then be the only company that remains for which I would run, for example, an Inventory mean and Revenue sum across the four complete quarters. I really appreciate the help and do apologize for not providing a better example first – Mohamed A. Desoky Jun 30 '16 at 02:43
  • I have re-posted the first part of this question here: http://stackoverflow.com/q/38113824/4318393 – Mohamed A. Desoky Jun 30 '16 at 04:28
  • I think I have a working solution for you now based on your updated data frame structure. – doc_owl Jun 30 '16 at 16:56