Need help implementing a function in R

Question

I'm trying to code in such a way as to take advantage of scripting, as I regularly get output in this format and having to copy/paste it in Excel one by one is a real chore. However, I'm stuck when it comes to implementing the functions.

So, my data is in the following form:

Condition Sample1 Sample2 .... Sample n
T1        6.99    5.80    ....  n_1      
T2        2.05    3.04    ....  n_1      
T3        4.50    4.69    ....  n_1      
T4        4.71    5.22    ....  n_1      
T5        5.66    3.65    ....  n_1      
T6        9.76    2.89    ....  n_1

I need to apply the following equation: , where x is the individual entry and n is a coefficient, such that the full equation looks something like this:

Example .

Basically, per column, I need to consider each element in sequence and multiply it by a sequential coefficient (odd numbers from 1: length Condition) to get the answer S, for each Sample. The size of my dataset will not change - it will always be T1:T6, what will change is Sample 1...n. Ideally the value of S will be appended at the bottom of the column, or saved in a separate dataset with reference to the sample that it belongs to.

I've tried a number of solutions, including transposing, but can't seem to wrap my head around it.

My current attempt at implementing a simpler function on a portion of the dataset yielded no success.

 for (i in 2:8){dT[7,i] <-
     ((1*dT[1,i])+(3*dT[2,i])+(5*dT[3,i])+(7*dT[4,i])+(9*dT[5,i]))+(11*dT[6,i])
 }

I think the right solution involves some kind of *apply but I'm completely clueless as to how to use those properly.

EDIT: Adding a reproducible example:

N   Condition   Sample A    Sample B    Sample C    Sample D

1   T1          91.323      78.758      70.298      66.765
3   T2          -3.737      -1.5        -7.744      -9.247
5   T3          5.205       4.533       2.284       2.178
7   T4          -0.486      -0.068      -1.386      -0.927
9   T5          0.337       -0.139      0.087       0.055
    S        -0.046296296   -0.123654391    0.394039047 0.445258425

It's easier to help you if you provide a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be copy/pasted into R and the desired output for that input so possible solutions can be test and verified. — MrFlick, Nov 06 '17 at 15:12
Some data for a reproducible example from a past iteration in Excel (using T1:T5, not T6, but same principle applies): Multiplier Condition Sample A Sample B Sample C Sample D 1 T1 91.323 78.758 70.298 66.765 3 T2 -3.737 -1.5 -7.744 -9.247 5 T3 5.205 4.533 2.284 2.178 7 T4 -0.486 -0.068 -1.386 -0.927 9 T5 0.337 -0.139 0.087 0.055 S -0.046296296 -0.123654391 0.394039047 0.445258425 — zirconium, Nov 06 '17 at 16:28

score 0 · Answer 1 · answered Nov 06 '17 at 15:22

0

Does this, what you want to do?

per_row <- function(row){
  l <- length(row)
  exp <- 2*(1:l)-1  # all the exponents
  each <- row*(-1)^exp  # compute all of these at once
  return(sum(each)) # return sum
}

#some sample data
datafr <- data.frame(a = sample(1:6), b = 1:6)

#apply per column
apply(datafr, 2, per_row)

answered Nov 06 '17 at 15:22

Bernhard

4,272
1
13
23

>Error in row * (-1)^exp : non-numeric argument to binary operator. For column 1, the answer is -1.62, for column 2, 0.645. Not sure what I'm doing wrong [edit2: this answer is for the simple operation in the form of sum[n(x)] for n_odd = 1->11, x being the entry – zirconium Nov 06 '17 at 15:29
This problem would be very easy to solve in MATLAB come to think of it - it's column vector operation. Ideally I'd solve it in R though as MATLAB is not easily available at the moment. – zirconium Nov 06 '17 at 15:33
First: It was unwise of me, to call the vector of n_odd `exp`, as this is already the name of a function. Second: I do not get that error or any error at all, when I copy the above code into a vanilla `R` session. But with your `dT` the first column is non-numeric, so `apply(dT[,2:5), 2, per_row)` should do the job. I have not yet checked results. – Bernhard Nov 07 '17 at 07:13

score 0 · Answer 2 · answered Nov 06 '17 at 15:32

What about this? It should be agnostic to the number of SampleN columns you have. Note that it is specifically designed for your 6 condition case with only 6 odd number multipliers, but you said that would not change so it should be alright.

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(tidyr))

samples <- tribble(
  ~Sample1, ~Sample2, ~Sample3,
  6.99, 5.80, 2.5,  
  2.05, 3.04, 3.4,    
  4.50, 4.69, 8.7,     
  4.71, 5.22, 8.6,     
  5.66, 3.65, 3.4,     
  9.76, 2.89, 5.6 
)

samples 
#> # A tibble: 6 x 3
#>   Sample1 Sample2 Sample3
#>     <dbl>   <dbl>   <dbl>
#> 1    6.99    5.80    2.50
#> 2    2.05    3.04    3.40
#> 3    4.50    4.69    8.70
#> 4    4.71    5.22    8.60
#> 5    5.66    3.65    3.40
#> 6    9.76    2.89    5.60

samples_modified <- samples %>%
  # Add the multipliers as a column
  mutate(multiplier = c(1,3,5,7,9,11)) %>%

  # Gather all the samples. Make it 'tidy'
  gather(key = "sample", value = "x", -multiplier) %>%

  # Perform the multiplication on each element, we will sum later
  mutate(x_modified = x * (-1) ^ ((multiplier - 1) / 2))

# Now we want to sum the x_modified column for each sample group
samples_modified %>%
  group_by(sample) %>%
  summarise(S = sum(x_modified))
#> # A tibble: 3 x 2
#>   sample       S
#>   <chr>    <dbl>
#> 1 Sample1  0.630
#> 2 Sample2  2.99 
#> 3 Sample3 -3.00

The function works perfectly (thanks a lot already!) but somehow only if I enter the data manually. However, I have a bit of trouble in coercing the data - if I enter the data as a data.table eg: samples <- data.table(openxlsx::read.xlsx("samples.xlsx", sheet = 3)) it then keeps throwing: >Error in mutate_impl(.data, dots) : Evaluation error: non-numeric argument to binary operator. Trying to coerce it with as.numeric throws: Error: (list) object cannot be coerced to type 'double'. It is also possible that some of my columns contain NA values, would this "break" it? — zirconium, Nov 06 '17 at 15:52
I replaced one of the values in the tribble above with an `NA` and it still ran, it just gave an `NA` value for `S` for that sample. I also coerced the tibble (a modified data.frame) to a data.table and ran it again and everything seems to work fine. We'd have to see a full data set to really diagnose the problem. You definitely don't want to coerce the data.table to numeric though. That won't work. Is it possible that some of your columns are being imported as characters and not numeric? What does `sum(sapply(NAMEOFDATATABLE, is.character))` return? (should be 0) — Davis Vaughan, Nov 06 '17 at 17:56

score 0 · Answer 3 · answered Nov 06 '17 at 15:48

0

Seems simple. Since R is vectorized use vectors' multiplication. And then sum.

zirconium <- function(x){
    n <- 2*seq_along(x) - 1
    sum(x * (-1)^((n - 1)/2))
}

sapply(dT[-1], zirconium)
#Sample1 Sample2 
#   0.63    2.99

DATA.

dT <-
structure(list(Condition = structure(1:6, .Label = c("T1", "T2", 
"T3", "T4", "T5", "T6"), class = "factor"), Sample1 = c(6.99, 
2.05, 4.5, 4.71, 5.66, 9.76), Sample2 = c(5.8, 3.04, 4.69, 5.22, 
3.65, 2.89)), .Names = c("Condition", "Sample1", "Sample2"), class = "data.frame", row.names = c(NA, 
-6L))

answered Nov 06 '17 at 15:48

Rui Barradas

70,273
8
34
66

It's still throwing the non-numeric argument to binary operator error when I try to apply it :/ class(myData) gives [1] "data.table" "data.frame" How can I coerce it? I tried purging NAs using a data.table(t(na.omit(t(myData))) and it didn't help. – zirconium Nov 06 '17 at 16:24
@zirconium I believe that if it gives that error, you should edit the question and post the output of `dput(dT)` there. (Not in a comment, in the question.) Like this we will have an exact copy of your data and test the several answers with it. – Rui Barradas Nov 06 '17 at 16:26

Need help implementing a function in R

3 Answers3

Linked