0

I am relatively new to R and I am trying to process a lot of weather data stored in a data frame. My goal is to loop through the data and compare the Element column of my data frame to several values. I plan to do something different depending on the value of Element sort of like a traditional switch statement. The element value might be one of 24 different values for example TMAX or TMIN. Currently the element column is a factor with 24 levels. What is the fastest way to loop through this data running a different set of code depending on the value of Element? How do I compare the values in Element to check if it is equal to TMAX for example? Efficiency is very important. Any suggestions are appreciated.

Example:

library("sqldf")
library("RPostgreSQL")

width = c(11, 4, 2, 4)
label = c("ID", "Year", "Month", "Element")
for (i in 1:31)
{
  width = c(width, 5, 1, 1, 1)
  label = c(label, paste("Value ",i), paste("MFLAG ",i), paste("QFLAG ",i), paste("SFLAG", i))
}
test = read.fwf("ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/all/USW00093734.dly", header = FALSE, fill = TRUE, widths = width, col.names = label)

#You need to set up your database for this to work
for(i in 1:length(test$ID))
{
  if(as.character(test$Element) == "TMAX")
  {
    for(j in 1:31)
    {
      insert_statement = paste("INSERT into noaa_temperature (id, max_temp, date) values (", test$ID[[i]],",",test[paste("Value..",j)],toString(as.Date(ISOdate(test$Year,test$Month,j))),")")
      sqldf(insert_statement)
    }
  }
  if((as.character(test$Element) == "WT01") || (as.character(test$Element) == "WT03")
  {
    for(j in 1:31)
    {
      if(test[paste("Value..",j)] == 1)
      {
        insert_statement = paste("INSERT into noaa_weather_type (id, code, date) values (", test$ID[[i]],",",toString(test$Element),toString(as.Date(ISOdate(test$Year,test$Month,j))),")")
        sqldf(insert_statement)
      }
    }
  }
}

I would like to execute a different section of code on each row depending on the value of the Element column in test.

josliber
  • 43,891
  • 12
  • 98
  • 133
klib
  • 697
  • 2
  • 11
  • 27
  • 4
    Don't loop, write a function with a `switch` (or maybe `ifelse`) statement and use `data.table`. (Or maybe neither switch nor ifelse, depends...) Currently your question is too vague to provide any more specific guidance. If you want specific help, work on making a [minimal, reproducible example](http://stackoverflow.com/q/5963269/903061). – Gregor Thomas Jun 18 '15 at 16:29
  • Could you give me an example of the most efficient way to switch on the value of the factor? – klib Jun 18 '15 at 16:59
  • Let's work on making this example *minimal* by using, e.g., `subset(test, Element %in% c("TMAX", "TMIN"))`, and then let's work on your actual problem by sharing what you actually want to do for those two values of `Element`. My guess is we can do this group-at-a-time rather than row-at-a-time, which will be **much** more efficient. Can't tell though unless you tell us what you're doing. – Gregor Thomas Jun 18 '15 at 17:08
  • 1
    Also, based on your code so far, if you're concerned with efficiency I'd highly recommend reading [The R Inferno](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf), especially Circles 2 and 3. – Gregor Thomas Jun 18 '15 at 17:09
  • Thanks for the reading recommendation. I am writing the data to a database for further analysis later. Depending on the value of Element the insert statement will differ because it may go to a different table. TMAX and TMIN actually go to the same table but there are other fields that do not. – klib Jun 18 '15 at 17:21
  • This illustrates why *you* need to make minimal example--my guess didn't work for your case, and why you need to make your goals clear. It sounds more and more like you don't need anything fancy at all, just perhaps a table with values of `Element` in one column and a destination table in another column. – Gregor Thomas Jun 18 '15 at 17:26
  • Part of my problem is I don't know how to code what I want to do which makes creating a nice example very hard. I added my attempt as an example although it will only work if you set up a db connection and it is definitely not efficient. – klib Jun 18 '15 at 18:01

0 Answers0