0

Hello, I want to have in output a vector when the entry is vector without using a function of apply's family. How should I write my function? Thanks. I used this code where I was forced to use two functions.

f1=function(l){
  y= B1 # vector of length N
  li= position # a vector  of length N 
  h=10
  a=(li-l)/h 
  Knorm=dnorm(a)
  b=Knorm*y
  num=sum(b)
  den=sum(Knorm)
  num/den
}
########## Forme vectorielle 
f2 = function(l){
  sapply(l,f1)
} 
L=seq(10000000,11000000,by=1)
f2(L)

If I compute f1(L) I will get one value. That's why I was forced to write a second function to apply my first function to each element of vector L. The purpose is to write it in one function.

Matt
  • 7,255
  • 2
  • 12
  • 34
Child79
  • 55
  • 4
  • Your code doesn't work, there are no `B1` nor `position`. To write it in one function you can put all of `f1` inside `f2`. – Rui Barradas Feb 01 '22 at 20:45

1 Answers1

1

Use outer and colSums to allow the function to take l as a vector:

f <- function(l){
  y <- B1 # vector of length N
  li <- position # a vector  of length N 
  h <- 10
  a <- outer(li, l, "-")/h
  Knorm <- dnorm(a)
  b <- Knorm*y
  num <- colSums(b)
  den <- colSums(Knorm)
  num/den
}

And here is a simpler equivalent function:

f <- function(l){
  Knorm <- dnorm(outer(position, l, "-")/10)
  colSums(Knorm*B1)/colSums(Knorm)
}

Compare to OP's function:

f1=function(l){
  y= B1 # vector of length N
  li= position # a vector  of length N 
  h=10
  a=(li-l)/h 
  Knorm=dnorm(a)
  b=Knorm*y
  num=sum(b)
  den=sum(Knorm)
  num/den
}

position <- 10:1
B1 <- 1:10

sapply(8:12, f1)
#> [1] 5.300480 5.220937 5.141656 5.062713 4.984177
f(8:12)
#> [1] 5.300480 5.220937 5.141656 5.062713 4.984177

UPDATE

Based on the comments, something like this may work best for the large vectors involved:

library(parallel)

f1 <- function(l) {
  dkAll <- abs(outer(position, l, "-"))
  Knorm <- dnorm(outer(position, l, "-")/pmax(dkAll[order(col(dkAll), dkAll)[seq(70, by = length(position), length.out = length(l))]], 1000))
  colSums(Knorm*y)/colSums(Knorm)
}

y <- seq(1, 100, length.out = 23710)
position <- seq(10351673, 12422082, length.out=23710)
l <- seq(11190000, 11460000, by=10)
# ysmoothed <- f1(l) # memory allocation error
cl <- makeCluster(detectCores())
clusterExport(cl, list("y", "position", "l", 'f1'))
system.time(ysmoothed <- parLapply(cl, l, f1))
#>    user  system elapsed 
#>    0.02    0.00   20.13

Created on 2022-02-02 by the reprex package (v2.0.1)
jblood94
  • 10,340
  • 1
  • 10
  • 15
  • Thanks for your comments. It works well. – Child79 Feb 02 '22 at 06:43
  • Thanks for your comments. It works well. What I want to do really is more complicated than that. I only explained part of the problem, your answer was very useful to me. Here is the detail of the problem, it would be a pleasure if you took a little of your time to answer. h = max{dk(l), hmin}, where dk(l) is the distance from l to the 70th nearest li and hmin=1000. Also y and li are vectors of length N=23000 and l is a vector of length 27000. – Child79 Feb 02 '22 at 06:57
  • Do you know an R function that sort each column of a matrix without using apply like this "apply(matrix,2,sort)? Do you also know a R function that return a vector of max of each column of a matrix? Thanks – Child79 Feb 02 '22 at 07:08
  • `dkAll <- abs(outer(li, l, "-")); h <- pmin(dkAll[order(col(dkAll), dkAll)[seq(70, by = length(li), length.out = length(l))]], 1000)` – jblood94 Feb 02 '22 at 12:41
  • That gets `h` from all distances, so it skips over `dk(l)`. To get `dk(l)`, it would be the first argument passed to `pmin`. See this answer for fast matrix sorting: https://stackoverflow.com/a/58559219/9463489 – jblood94 Feb 02 '22 at 12:44
  • ` f1=function(l){ y= B1 li= position k=70 hmin=1000 eca= abs(outer(li,l,"-")) dkl= apply(eca,2,sort)[70,] ## distance from l to the kth nearest li h=apply(rbind(dkl,rep(hmin,length(l)) ),2,max) a <- (outer(li, l, "-"))/(outer(rep(1,length(li)),h,"*")) Knorm <- dnorm(a) colSums(Knorm*y)/colSums(Knorm) } L= seq(11190000,11460000,by=10) ysmoothed =f1(L) ` – Child79 Feb 02 '22 at 20:26
  • I used this code but I got ***Error: cannot allocate vector of size 4.8 Gb***. I will test your code and get back to you. For an example position=seq(10351673,12422082,by=23710) and y is a vector of length 23710 with values ranging between 0 and 100. – Child79 Feb 02 '22 at 20:33
  • ` f1=function(l){ y= B1; li= position; k=70; hmin=1000 ;eca= abs(outer(li,l,"-")); dkl= apply(eca,2,sort)[70,] ## distance from l to the kth nearest li ;h=apply(rbind(dkl,rep(hmin,length(l)) ),2,max) ;a <- (outer(li, l, "-"))/(outer(rep(1,length(li)),h,"")) ;Knorm <- dnorm(a) ;colSums(Knormy)/colSums(Knorm) } L= seq(11190000,11460000,by=10) ;ysmoothed =f1(L) ` – Child79 Feb 02 '22 at 20:37
  • The `pmin` in my earlier comment should have been `pmax`. – jblood94 Feb 02 '22 at 21:49
  • I don't see `y` in the function you posted. I'll assume `Knormy` is supposed to be `Knorm*y`. – jblood94 Feb 02 '22 at 21:50
  • `position=seq(10351673,12422082,by=23710)` returns a vector of length 88, which isn't the same length as `y`. Do you mean `position=seq(10351673,12422082,length.out=23710)`? – jblood94 Feb 02 '22 at 22:00
  • I was able to replicate the memory allocation error, but this is really hard to follow in the comments. Please start a new question with the representative data and what you have for your function so far. I would also suggest also including a smaller dataset and the corresponding output. – jblood94 Feb 02 '22 at 22:04
  • See the update. – jblood94 Feb 02 '22 at 22:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241670/discussion-between-childeric-and-jblood94). – Child79 Feb 03 '22 at 00:12
  • You understand correcty my problem. Your dataset are very good for examples. Your update function work very well in few minutes. That is what I am looking for. What I did myself with two functions took 30 minutes to calculate ysmoothed. Is it possible to have position and y as input in order to give the function another position and y? I tried this `f1 <- function(l, position=BC1, y=mBC1 )` but I get an error `Error in get(name, envir = envir) : object 'y' not found`. Assume `BC1= seq(10351673, 12422082, length.out=23710); mBC1= seq(1, 100, length.out = 23710)`. Thank you. – Child79 Feb 03 '22 at 02:07