0

I am pretty new to R and have a couple of questions about a loop I am attemping to execute. I will try explain myself as best as possible reguarding what I wish the loop to do.

for(i in (1988:1999,2000:2006)){
    yearerrors=NULL
    binding=do.call("rbind.fill",x[grep(names(x), pattern ="1988.* 4._ data=")])
    cmeans=lapply(binding[,2:ncol(binding)],mean)
    datcmeans=as.data.frame(cmeans)
    finvec=datcmeans[1,]
    kk=0
    result=RMSE2(yields[(kk+1):(kk+ncol(binding))],finvec)
    kk=kk+ncol(binding)
    yearerrors=c(result)
}

yearerrors
  1. First I wish for the loop to iterate over file names of data. Specifically over the years 1988-2006 in the place where 1988 is placed right now in the binding statement. x is a list of data files inputted into R and the 1988 is part of the file name. So, I have file names starting with 1988,1989,...,2006.

  2. yields is a numeric vector and I would like to input the indices of the vector into the function RMSE2 as indicated in the loop. For example, over the first iteration I wish for the indices 1 to the number of columns in binding to be used. Then for the next iteration I want the first index to be 1 more than what the previous iteration ended with and continue to a number equal to the number of columns in the next binding statement. I just don't know if what I have written will accomplish this.

  3. Finally, I wish to store each of these results in the vector yearerrors and then access this vector afterwards.

Thanks so much in advance!

user1836894
  • 293
  • 2
  • 5
  • 18
  • 1
    You may save alot of work by putting your data in their own folder and setting the working directory there and then looping with `for(i in list.files()){}`...just a thought on part 1. – Seth Feb 26 '13 at 22:29
  • 1
    Hi, you might want to take a look at `?as.character` and `paste0`. Also, if you want to combine two separate ranges, make sure to use `c()` as in `c(1800:1850, 2003:212)` – Ricardo Saporta Feb 26 '13 at 22:32
  • `(1988:1999, 2000:2006)` is not valid, it needs to be `c(1988:1999, 2000:2006)` – Chase Feb 26 '13 at 22:33
  • So you want the rbind the list of files, and then work on the combined dataframe? – alexwhan Feb 26 '13 at 22:35
  • @Ricardo & Chase: Thank you I did change the ranges and I will look into those commands Ricardo. – user1836894 Feb 26 '13 at 22:41
  • @alex: yes, I wish to rbind and then take the column means of that combined dataframe – user1836894 Feb 26 '13 at 22:42
  • There are several problems here, many of which are answered in other questions on stackoverflow. It would be helpful if you showed how your data is structured (ie `str(1988)`) or much better still, create a reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – alexwhan Feb 26 '13 at 22:46
  • It is quite difficult for me to create a reproducible example as I have 1300 data files loaded into R. Each set of data is a 41 by 5000 set of numerical data. My biggest concern is how to loop through the years in the names of my data. I can see paste helping me overcome this somehow. – user1836894 Feb 26 '13 at 23:25

1 Answers1

1

OK, there's a heck of a lot of guesswork here because the structure of your data is extremely unclear, I have no idea what the RMSE2 function is (and you've given no detail). Based on your question the other day, I'm going to assume that your data is in .csv files. I'm going to have a stab at your problem.

I would start by building the combined dataframe while reading the files in, not doing one then the other. Like so:

#Set your working directory to the folder containing the .csv files
#I'm assuming they're all in the form "YEAR.something.csv" based on your pattern matching

filenames <- list.files(".", pattern="*.csv") #if you only want to match a specific year then add it to the pattern match
years <- gsub("([0-9]+).*", "\\1", filenames)

df <- mdply(filenames, read.csv)
df$year <- as.numeric(years[df$X1]) #Adds the year
#Your column mean dataframe didn't work for me
cmeans <- as.data.frame(t(colMeans(df[,2:ncol(df)])))

It then gets difficult to know what you're trying to achieve. Since your datcmeans is a one row data.frame, datcmeans[1,] doesn't change anything. So if a one row from a dataframe (or a numeric vector) is an argument required for your RMSE2 function, you can just pass it datcmeans (cmeans in my example).

Your code from then is pretty much indecipherable to me. Without know what yields looks like, or how RMSE2 works, it's pretty much impossible to help more.

If you're going to do a loop here, I'll say that setting kk=kk+ncol(binding) at the end of the first iteration is not going to help you, since you've set kk=0, kk is not going to be equal to ncol(binding), which is, I'm guessing, not what you want. Here's my guess at what you need here (assuming looping is required).

yearerrors=vector("numeric", ncol(df)) #Create empty vector ahead of loop

for(i in 1:ncol(df)) {
  yearerrors[i] <- RMSE2(yields[i:ncol(df)], finvec)
}
yearerrors

I honestly can't imagine a function that would work like this, but it seems the most logical adaption of your code.

alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • Thank you for your patience alex and trying to construct an answer for me. I eventually ended up doing what I needed with the help of paste0. Sorry about the explanation, as I said I'm new to R and maybe cannot explain exactly how I need to just yet. Thanks again though. – user1836894 Feb 27 '13 at 23:22