0

I am using the code below to get cumulative rainfall by year and plot them as lines for each year but I get the following error when I run the ddply function from the plyr package.

Error: 'names' attribute [9] must be the same length as the vector [3]

library(plyr)

# Setting work directory

setwd("d:\\ClimData")

# Reading and reformatting raw data downloaded from NCDC

dat<-read.table("CDO2812586929956.txt",header=F,skip=1)

colnames(dat)<-c("stn","wban","yearmoda","temp","tempc","dewp","dewpc","slp","slpc","stp","stpc","visib","visibc","wdsp","wdspc","mxspd","gust","maxtemp","mintemp","prcp","sndp","frshtt")

dat$yearmoda <- strptime(dat$yearmoda,format="%Y%m%d")

dat$prcp <- as.character(dat$prcp)
dat$prcp1 <-as.numeric(substr(dat$prcp,1,4))
dat$prcpflag <- substr(dat$prcp,5,5)

dat$rain  <- dat$prcp1*25.4

dat$rain[dat$rain > 1000 ] <- NA

dat$year <- as.numeric(format(dat$yearmoda,"%Y"))
dat$month <- as.numeric(format(dat$yearmoda,"%m"))
dat$day <- as.numeric(format(dat$yearmoda,"%d"))

# Getting cumulative sum of rain/year

dat <- ddply (dat,.(year), transform, cumRainfall = cumsum (rain))

Hopefull someone can point out where I went wrong.

The input file is at the link below.

https://dl.dropboxusercontent.com/u/81632971/CDO2812586929956.txt

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
Funkeh-Monkeh
  • 649
  • 6
  • 17
  • 1
    use dim(dat) before assigning names to the column to verify you have same number of columns as you are assigning. – Chitrasen Apr 02 '14 at 04:11
  • I have no problems assigning names to the columns. The error only appears when I carry out the 'plyr' function at the end of the code to get the cumulative sum of rainfall. – Funkeh-Monkeh Apr 02 '14 at 05:04
  • When converting to `as.numeric(dat$prcp)`, is the original `dat$prcp` actual number values all the way through? – Rich Scriven Apr 02 '14 at 05:16
  • Try to provide a _minimal_ reproducible example. Please check these links for general ideas, and how to do it in R: [**here**](http://stackoverflow.com/help/mcve), [**here**](http://www.sscce.org/), and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). Do we really need 17800 rows to reproduce the error? Only two of the 28 variables are used in `ddply`. – Henrik Apr 02 '14 at 05:34
  • 2
    The original dat$prcp is a character but converted to numerics when it becomes dat$prcp1 and subsqeuently dat$rain. Henrik, I will go through the links you posted to get an idea about providing a minimal reproducible example. Thanks. – Funkeh-Monkeh Apr 02 '14 at 06:00

2 Answers2

0

I don't know if that solves your problem as I currently cannot run your minimum working example, but if you want to try something else than plyr the package doBy provides a function summaryBy.

rain <- summaryBy(rain ~ year, data = dat, FUN = sum)

From there plotting should be straightforward.

iraserd
  • 669
  • 1
  • 8
  • 26
0

I don't really know what's wrong with your data frame because if I create a new data frame as:

> dd = data.frame(year = as.factor(dat$year), rain = dat$rain)

ddply works. I am quite sure that there is a sort of bug with plyr when the data frame has a POSIXlt/ct date as column, in fact if you remove yearmoda column the error disappears...

UPDATE It's not a "sort of" bug, it's a real bug: https://github.com/hadley/plyr/issues/159

Matteo De Felice
  • 1,488
  • 9
  • 23