2

I have some python and numpy experience but have never used R before. I'm trying to help my wife with her R project since although she has a much better grasp on statistics, she has little programming experience. I'm finding the syntax and documentation of R very confusing.

The original thing We wanted to do was loop through a large data.frame, do a bunch of spacial calculations involving prior and subsequent records, a little trig and some quality checks on the data and generate a new object with the data. We then will get this new data into GIS


EDIT: Just to be clear, the calculations in this example are just a placeholder, and are nothing like the actual calculations I needed to do.


Initially I tried something like this:

> result = list()
> for (i in 1:5) {
+   #Calculate some dummy data. The actual calculations are much more involved
+   param1 = i * 1.1
+   param2 = i * 5.3
+   param3 = i + a_value
+   # Now append these calculated values to some sort of object
+   sample = list(param1=param1,param2=param2,param3=param3)
+   result <- rbind(result,sample)
+ }
> print(result)
       param1 param2 param3
sample 1.1    5.3    12    
sample 2.2    10.6   13    
sample 3.3    15.9   14    
sample 4.4    21.2   15    
sample 5.5    26.5   16

The "sample" column seems un-necessary, but oh well, it looks good. Now to reference a single column...

> result$param2
NULL

???I tried getting rid of 'sample' by:

+   result <- rbind(result,list(param1=param1,param2=param2,param3=param3))
>
     param1 param2 param3
[1,] 1.1    5.3    12    
[2,] 2.2    10.6   13    
[3,] 3.3    15.9   14    
[4,] 4.4    21.2   15    
[5,] 5.5    26.5   16 
> result$param2
NULL

Perhaps this data frame thing will work. I changed the first line to:

result = data.frame()
>
   param1 param2 param3
2     1.1    5.3     12
21    2.2   10.6     13
3     3.3   15.9     14
4     4.4   21.2     15
5     5.5   26.5     16
> result$param2 # One column
[1]  5.3 10.6 15.9 21.2 26.5
> result[2,] #One row
   param1 param2 param3
21    2.2   10.6     13
> result[3,]$param3 # Single value
[1] 14

So it's working, but I'm not sure what the 21 (row number?) is all about. If I have more rows, the 21st row is '211'.

Could someone tell me why the first case didn't work, what the '21' is all about, and if there is a better way to do this. Much of what I've read indicates that loops in R are a sign you don't know what you are doing, but the learning curve on the alternatives seems steep. This is also why the script takes an amazingly long time to run, even on a fast machine.

RyanN
  • 740
  • 8
  • 20
  • A few things to help you to think in "R". 1) Think in vectors. Most of the time you do not need an explicit loop to do things like what you have above. For instance, your simple addition can be accomplished like this `> 1:5 + 1.1 [1] 2.1 3.1 4.1 5.1 6.1`. Here's a post on vectorization: http://shape-of-code.coding-guidelines.com/2010/09/04/thinking-in-r-vectors/. 2) Search for "preallocate"http://stackoverflow.com/questions/4034059/iteratively-constructed-dataframe-in-rx. The odd row numbers you're seeing are a goofy byproduct of the `rbind()`ing you've got going on. Not necessary! – Chase Mar 13 '12 at 03:05
  • The row numbers are the least of your concerns. They are simply the result of R trying to make sense of being asked to do something slightly absurd. Focus instead on Chase's advice: _Don't grow objects!_ Preallocate. And vectorize everything you can. – joran Mar 13 '12 at 03:27
  • The calculations I put in were just placeholders. The actual calculations were a bit more complex. Every record had a lat lon. I needed to calculate the bearing between each point and the next. I also had to figure out the lat, long points for the corners of a rectangle centered on the point, but aligned with the bearing . I thnk had to convert these rectangles to polygons so I could export them to arcGIS. – RyanN Mar 13 '12 at 18:27
  • If need to use spatial data, there are a bunch of packages for that. `rgdal` for reading and writing formats, `rgeos` for doing funky things with geometry, `spdep` for a bunch of spatial dependcies models, `sp` for basic data types and plots, and many many more. – fmark May 07 '12 at 07:21

2 Answers2

3

The problem is that R works very differently than other programming languages. It generally is not very fast to use a loop. Instead use the vectorization that makes R easy to work with (but different than other languages). So for your problem I'd probably do:

i=1:5
data.frame(param1 = i * 1.1, param2 = i * 5.3, param3 = i*2+9)

Also check out apply, lapply, sapply, ifelse, etc. Also note that many functions are vectorized and work readily on vectors.

If you really wanted to fix up what you have you could use the following:

 result = list()
 for (i in 1:5) {
   #Calculate some dummy data. The actual calculations are much more involved
   param1 = i * 1.1
   param2 = i * 5.3
   param3 = 2*i+9
   # Now append these calculated values to some sort of object
   sample = list(param1=param1,param2=param2,param3=param3)
   result <- data.frame(rbind(result,sample))
   rownames(result) <- 1:nrow(result)
 }
 print(result)
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
2

Note the results of the following

row.names(result) <- 1:nrow(result)
result

i <- 1:5
i * 5.3
i

As you can see... writing in R is not like python the way you're using it, although it can be like numpy. It has similar properties to numpy in that math commands on vectors are automatically propagated to all. It's also like numpy in that this doesnt' work for everything.

John
  • 23,360
  • 7
  • 57
  • 83