5

I need to make tutorial for beginner using the R *apply function (without using reshape or plyr package in a first time)

I try to lapply (because i read apply is not good for dataframe) a simple function to this dataframe, and i want to use named column to access data :

fDist <- function(x1,x2,y1,y2) {
  return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5)  
}

data <- read.table(textConnection("X1 Y1 X2 Y2
 1 3.5 2.1 4.1 2.9
 2 3.1 1.2 0.8 4.3
 "))

data$dist <- lapply(data,function(df) {fDist(df$X1 , df$X2 , df$Y1 , df$Y2)})

I have this error $ operator is invalid for atomic vectors, it is probably because the dataframe is modified by laply ?... is there a best way to do that with $ named column?

I resolve my first question with @DWin answer. But i have another problem, misunderstanding, with mixed dataframe (numeric + character) :

In my new use case, i use two function to compute distance, because my objective is to compare a distance Point between all of other Point.

data2 <- read.table(textConnection("X1 Y1 X2 Y2
     1 3.5 2.1 4.1 2.9
     2 3.1 1.2 0.8 4.3
     "))

data2$char <- c("a","b")

fDist <- function(x1,y1,x2,y2) {
 return (0.1*((x1 - x2)^2 + (y1-y2)^2)^0.5) 
}

fDist2 <- function(fixedX,fixedY,vec) { 
 fDist(fixedX,fixedY,vec[['X2']],vec[['Y2']])
}

# works with data (dataframe without character), but not with data2 (dataframe with character)
#ok
data$f_dist <- apply(data, 1, function(df) {fDist2(data[1,]$X1,data[1,]$Y1,df)})
#not ok
data2$f_dist <- apply(data2, 1, function(df) {fDist2(data2[1,]$X1,data2[1,]$Y1,df)})
reyman64
  • 523
  • 4
  • 34
  • 73
  • 2
    If you are looping over columns of a dataframe, which is what `lapply` does, the internal function will only see one column at a time. – IRTFM Mar 08 '12 at 21:20

3 Answers3

11

In this case apply is what you need. All of the data columns are of the same type and you don't have any worries about loosing attributes, which is where apply causes problems. You will need to write your function differently so it just takes one vector of length 4:

 fDist <- function(vec) {
   return (0.1*((vec[1] - vec[2])^2 + (vec[3]-vec[4])^2)^0.5)  
                        }
 data$f_dist <- apply(data, 1, fDist)
 data
   X1  Y1  X2  Y2    f_dist
1 3.5 2.1 4.1 2.9 0.1843909
2 3.1 1.2 0.8 4.3 0.3982462

If you wanted to use the names of the columns in 'data' then they need to be spelled correctly:

 fDist <- function(vec) {
   return (0.1*((vec['X1'] - vec['X2'])^2 + (vec['Y1']-vec['Y2'])^2)^0.5)  
                        }
 data$f_dist <- apply(data, 1, fDist)
 data
#--------    
X1  Y1  X2  Y2    f_dist
1 3.5 2.1 4.1 2.9 0.1000000
2 3.1 1.2 0.8 4.3 0.3860052

Your updated (and very different) question is easy to resolve. When you use apply it coerces to the lowest common mode denominator, in this case 'character'. You have two choices: either 1) add as.numeric to all of your arguments inside the functions, or 2) only send the columns that are needed which I will illustrate:

data2$f_dist <- apply(data2[ , c("X2", "Y2") ], 1, function(coords) 
                                       {fDist2(data2[1,]$X1,data2[1,]$Y1, coords)} )

I really do not like how you are passing parameters to this function. Using "[" and "$" within the formals list "just looks wrong." And you should know that "df" will not be a dataframe, but rather a vector. Because it's not a dataframe (or a list) you should alter the function inside so that it uses "[" rather than "[[". Since you only want two of the coordinates, then only pass the two (numeric) ones that you would be using.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I have some problem with conversion of my dataframe into fDist, don't understand why : `fDist2 <- function(X1,X2,columnVector) {fDist(X1,X2,as.numeric(columnVector[["X"]]),as.numeric(columnVector[["Y"]]))}` and `apply(data99_07,1, function(df) { fDist2 (data99_07[data99_07$CODCOM==75101,]$X,data99_07[data99_07$CODCOM==75101,]$Y,df)})` I need to make conversion because anonymous function return a character vector :/ – reyman64 Mar 09 '12 at 16:24
  • If a column, vec, is class "factor", then the approved method is to convert it to numeric with `as.numeric(as.character(vec)`. You cannot just use `as.numeric(vec)` and get interpretable results. – IRTFM Mar 09 '12 at 17:43
  • Before the anonymous function, columnVector is numeric, and after it is a character vector, so i need to convert it into numeric to make calculation, so is it possible apply or anonymous function make implicit conversion of vector ? – reyman64 Mar 09 '12 at 21:00
  • If a vector is of class "character" then just using `as.numeric(colVec)` will succeed in supplying numeric values to any function. But if it's a factor (and you MUST check) you need the "double-function-wrapping" method. The double-wrapping is safer if you are not aware of how to check , i.e., ... `class(colVec)`, – IRTFM Mar 09 '12 at 23:12
  • Thanks for answer, but i think my question was not clear, i update my post for better comprehension. – reyman64 Mar 12 '12 at 09:24
  • It wasn't so much unclear as it was overly simplified. In R the mode of arguments really matters a lot and you didn't provide a "mixed" mode example. – IRTFM Mar 12 '12 at 14:52
6

As a side note, generally, its best to avoid using data as a variable name since its a function in base R:

dat <- read.table(textConnection("X1 Y1 X2 Y2
 1 3.5 2.1 4.1 2.9
 2 3.1 1.2 0.8 4.3
 "))

lapply feeds a single column of the data.frame to the function.

lapply(dat, function(df) print(df))

Instead, you want apply. But it feeds a single row as a vector, which doesn't use the $ operator. Instead, you can index directly:

apply(dat, 1, function(vec) {fDist(vec[1] , vec[3] , vec[2] , vec[4])})

Or rewrite the function to take the positional arguments as additional arguments.

fDist <- function(vec, pos1, pos2, pos3, pos4) {
    return (0.1*((vec[pos1] - vec[pos2])^2 + (vec[pos3]-vec[pos4])^2)^0.5)
}

apply(dat, 1, fDist, pos1=1, pos2=3, pos3 = 2, pos4=4)

However, the best solution would be to vectorize your function completely:

fDist <- function(df) {
   return (0.1*((df$X1 - df$X2)^2 + (df$Y1-df$Y2)^2)^0.5)  
}
Justin
  • 42,475
  • 9
  • 93
  • 111
0

For anyone who come across this topic later. The vec['X1'] method suggested in the accepted answer does work, but it will lose the data type of X1 and make everything chr. The better solution to access columns by names with keeping date types is to use lapply(). Just like below:

lapply(1, function(i, df) {fDist2(df[1,]$X1,df[1,]$Y1,df)}, df=data2)[[1]]

Here in lapply(), i is a must have and then just pass your dataframe data2 as an additional parameter df and then you are able to reference any columns using df$any_column_you_want inside the function(){}.

Jin
  • 57
  • 9