0

Hey everyone I know I have seen posts like this before but for some reason none of the advice I have tried has worked. Essentially what I am trying to do is take the dates from a variable named "Production.Period.End.Date" which is formatted as dd/mm/yyyy and turn each part of these dates into separate objects to analyze. The reason I am doing this is to take the annual average Kilowatt production labeled "Period_kWh_Production" and track changes of that overtime. I pasted the code I have put so far below if that helps.

setwd("C:\Users\fredd\Dropbox\Grad_Life\Spring_2017\AFM\Final_Paper\")

KWTProd.df = read.csv("Merge1//Kwht_Production_07-15.csv", header=T)

##Did this to verify "Production.Period.End.Date"

names(KWTProd.df)

##
names(KWTProd.df)
[1] "Application.Number"                     
[2] "Program.Administrator"                  
[3] "Program"                                
[4] "Total.Cost"                             
[5] "System.Owner.Sector"                    
[6] "Host.Customer.Sector"                   
[7] "Host.Customer.Physical.Address.City"    
[8] "Host.Customer.Physical.Address.County"  
[9] "Host.Customer.Physical.Address.Zip.Code"
[10] "PBI.Payment.."                          
[11] "Production.Period.End.Date"             
[12] "Period_kWh_Production" <-IT EXISTS ##
##

##Did this to plot changes of Period_kWh_Production over time##

plot(Period_kWh_Production ~ Production.Period.End.Date, data = KWTProd.df)

##Tried to do this to aggregate data in average##

aggregate(Period_kWh_Production~Production.Period.End.Date,KWTProd.df,mean)

##Still too noisy and can't find the mean by year :C##

as.date(Production.Period.End.Date, data = KWTProd.df)

##Says "Production.Period.End.Date" Not found BUT IT EXISTS##

##Tried this to group and summarise by year but it says: Error in     UseMethod("mutate_") : 
no applicable method for 'mutate_' applied to an object of class "function"         ## 

summary <- df %>%
  mutate(dates = dmy(Production.Period.End.Date),
         year  = year(Production.Period.End.Date)) %>%
  group_by(year) %>%
  summarise(mean = mean(x, na.rm = TRUE),
            sd   = sd(x, na.rm = TRUE))

##Trying this but have no clue how I am supposed to use this##

regexpr("<dd>")
Fred Ditzian
  • 11
  • 1
  • 1
  • 3
  • Don't know much about the code, but the regex is `\d{2}/\d{2}/\d{4}` –  Apr 22 '17 at 22:12

1 Answers1

0

This code should depend on dplyr and lubridate packages. You haven't provided sample data. so this is not tested.

library(lubridate)
library(dplyr)

summary <- df %>%
  mutate(end_date = dmy(Production.Period.End.Date),
         production_year  = year(end_date)) %>%
  group_by(production_year) %>%
  summarise(mean_kwH = mean(Period_kWh_Production, na.rm = TRUE),
            sd_kwH = sd(Period_kWh_Production, na.rm = TRUE))
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
  • I tried that but for some reason I keep getting the error: Error: unexpected ')' in: " summarise(mean_kwH = mean(Period_kWh_Production, na.rm = TRUE), sd_kwH = sd(Period_kWh_Production), na.rm = TRUE))" > no applicable method for 'mutate_' applied to an object of class "function" – Fred Ditzian Apr 22 '17 at 22:36
  • If you add data to your question, we can help. Generally use function `dput` and paste the results. I suggest you review http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example . I edited removed an extra ) – Andrew Lavers Apr 22 '17 at 22:40
  • Sorry that I making this harder than it has to be but dput seems to make my console explode with numbers because it is a large data set. I am not sure if this helps in anyway but based on the comments in the link you sent me I used paste bin to reduce the number of results that show up however, I still got this: – Fred Ditzian Apr 22 '17 at 22:56
  • 1/2008", "1/1/2009", "1/1/2010", "1/1/2011", "1/1/2012", "1/1/2013", "1/1/2014", "1/1/2015", "1/1/2016", "1/1/2017", "1/10/2010", "1/10/2011", "1/10/2012", "1/10/2013", "1/10/2016", "1/11/2009", "1/11/2010", "1/11/2011", "1/11/2012", "1/11/2013", "1/11/2015", "1/11/2016", "1/12/2009", "1/12/2010", "1/12/2011", "1/12/2012", "1/12/2014", "1/12/2015", "1/12/2016", "1/13/2009", "1/13/2010", "1/13/2011", also here is the link if that is more coherent https://pastebin.com/raw/mX4u7xqN – Fred Ditzian Apr 22 '17 at 22:56
  • It looks like your data frame is named KWTProd.df, so something like dput(KWTProd.df[10,]) would list a statement that can recreate ten rows. Its best to then edit to include it in the question. Alternatively, editing your question to include a few rows from your csv file would work as well. – Andrew Lavers Apr 22 '17 at 23:31
  • For some reason even when I try [1,] I still get an absurd # of values as in I exceeded the character limit by almost 70,000. Also sorry to keep stacking the issues but I also keep getting the error "no applicable method for 'mutate_' applied to an object of class "function" even when I am not doing anything involve the mutate function. And one last issue I keep running into is that it keeps reporting that "Production.Period.End.Date" can't be found even though it shows it as variable when I run "names (KWTProd.df)" – Fred Ditzian Apr 22 '17 at 23:36
  • Edited my answer to avoid conflict in names of function `year` defined in lubridate. Try `as.date(KWTProd.df$Production.Period.End.Date)` – Andrew Lavers Apr 23 '17 at 00:14
  • as.Date(KWTProd.df$Production.Period.End.Date, "%m/%d/%y") – Andrew Lavers Apr 23 '17 at 00:21
  • So would add that to the original dput code to read as: dput(KWTProd.df[10,]as.date(KWTProd.df$Production.Period.End.Date)) or is this for some other part of the code? The reason I ask is that I once again got an excessive number of values after running that command. – Fred Ditzian Apr 23 '17 at 00:44
  • No i was correcting your statement `as.date(Production.Period.End.Date, data = KWTProd.df)` – Andrew Lavers Apr 23 '17 at 00:48