1

I have a data frame x with 8 integer columns (and about 1000 rows of data). I have created a UDF 'test' that takes 8 integer parameters and return a single value. I have tested the UDF by passing it arbitrary integer values and it does return a single value so I know it works. I would like to now pass it the 8 integer columns, row by row, and have it return the value as a new column for each row in the data frame. I have tried x$NewColumn = test(x$Col1, x$Col2 .... x$Col8) but the function returns an error that would suggest the data is not being correctly passed through. Can someone tell me what I'm doing wrong?

Alpha
  • 807
  • 1
  • 10
  • 14
zgall1
  • 2,865
  • 5
  • 23
  • 39
  • 2
    Welcome to Stack Overflow! Please add reproducible sample for good people here to help you. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – CHP Apr 30 '13 at 16:06

3 Answers3

1

You could use mapply

mapply(test, x$Col1, x$Col2 .... x$Col8)
CHP
  • 16,981
  • 4
  • 38
  • 57
  • Wouldn't you think `apply` is more convenient notation in this case? The data is already in a `data.frame`. `mapply` is one of *many* methods, but the OP *should* use whatever is best suited. I disagree that they *should* use `mapply` because it seems less convenient, but they *could*. – Simon O'Hanlon Apr 30 '13 at 16:20
  • `apply` converts the `data.frame` to `matrix` first, which may not be desired. – CHP Apr 30 '13 at 16:28
  • The OP has already stated they have columns of integer values. – Simon O'Hanlon Apr 30 '13 at 16:32
  • 1
    @geektrader I ended up using this approach as it was the simplest one to understand with my limited background. Thanks for the help. – zgall1 May 07 '13 at 15:31
1
df = data.frame(matrix(runif(80),ncol=8))
# creation of a matrix for the example

my.function = function (x) { return (mean(x)) } # write your function

# and then use the apply function

new.column = apply(df,1, my.function)

df$new.column = new.column
Remi.b
  • 17,389
  • 28
  • 87
  • 168
0

Try using the apply function to run across the rows of your data.frame:

## Create some data
df <- as.data.frame( matrix(runif(40),10) )

## Now we can use 'apply'. The '1' in the second argument means we apply across rows, if it were two we would apply across columns.
## The function we are applying to each row is to sum all the values in that row
df$Total <- apply( df , 1 , sum )


## We can also pass 'anonymous' functions. In this instance our function takes a single vector, 'x'
## 'x' is all the values of that row, and we can use them like so to do the same thing as 'sum' in the previous example
df$Function <- apply( df , 1 , function(x) x[1] + x[2] + x[3] + x[4] )

## And if we see what is in df, 'df$Total' and 'df$Function' should have the same values
df
#         V1        V2         V3        V4    Total Function
#1  0.6615353 0.5900620 0.02655674 0.1036002 1.381754 1.381754
#2  0.8471900 0.8927228 0.77014101 0.6379024 3.147956 3.147956
#3  0.8783624 0.6769206 0.09598907 0.6681616 2.319434 2.319434
#4  0.7845933 0.8992605 0.13271067 0.3691835 2.185748 2.185748
#5  0.9753706 0.1374564 0.12631014 0.3693808 1.608518 1.608518
#6  0.4229039 0.7590963 0.79936058 0.2674258 2.248787 2.248787
#7  0.2635403 0.6454591 0.98748926 0.5888263 2.485315 2.485315
#8  0.7008617 0.7505975 0.39355439 0.5943362 2.439350 2.439350
#9  0.1169755 0.1961099 0.88216054 0.3383819 1.533628 1.533628
#10 0.3298974 0.0110522 0.88460835 0.3700531 1.595611 1.595611
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • You can also use `colSums(df)`. – Jilber Urbina Apr 30 '13 at 16:09
  • @Jilber in this example yes, but the OP has an undefined 'function' that takes all the values and spits out one value. It is unclear if that function is a simple sum, hence my second example. – Simon O'Hanlon Apr 30 '13 at 16:11
  • I'm not sure I fully get what you are saying. I understand the basics of the apply function but I am not sure of the syntax that I would use in my situation. From what I can see in your example, you have created a function(x) that takes a single parameter that is the sum of the the 4 columns. How do I translate that to my case where I have a pre-defined UDF and need to pass it multiple parameters? – zgall1 Apr 30 '13 at 16:11
  • @SimonO101 My function is not a simple sum – zgall1 Apr 30 '13 at 16:12
  • @user2336618 I understand it's not a simple sum. The function takes a single argument, not a single parameter. The argument is x` and `x` is all the values for that row. In the first example we use the pre-defined function `sum` on all the values in the row. but you can make your own function. So in your case you will have `x[1]` will be the value in the first column, `x[2]` the second value, up to `x[8]` for the value in the 8th column. Does that make sense? – Simon O'Hanlon Apr 30 '13 at 16:14
  • @user2336618 in your dataframe (let's pretend it's called `df` ) if you do `df[1,]` R returns all the values for the first row right? There are 8 of them? Imagine `apply` is conveniently going through each row and assinging `x <- df[1,]` then `x <- df[2,]` all the way through to the last row. So you get a vector of values which you can use in your function however you wish. – Simon O'Hanlon Apr 30 '13 at 16:18
  • @user2336618 did any of these answers help you? – Simon O'Hanlon May 01 '13 at 16:14