0

I want to apply a custom function to every column of df and assign the value that function returns to a new column in that dataframe. My function takes a vector of values from chosen columns (in my case values from columns 12:17 will be used), and returns a calculated value (diversity index). The function is defined as:

shannon <- function(p){
  if (0 %in% p) {
    p = replace(p,p==0,0.0001)
  } else {
    p
  }
  H = -sum(p*log(p))
  return (H)
}

A random row from the dataset looks like this:

p <- df[3000,12:17]
        x1        x2        x3        x4         x5 x6
 0.5777778 0.1777778 0.1555556 0.2888889 0.02222222  0

When I apply the custom function to this row, like this:

shannon(as.vector(t(p)))

It returns the correctly calculated value of 1.357692.

Now, I want to make this value into a new column of my dataset, by applying the custom function to the specific columns form my dataset. I try to do it using mutate and sapply by running:

df <- mutate(df, shannon = sapply(as.vector(t(census[,12:17])), shannon))

but it returns

Error in `mutate()`:
! Problem while computing `shannonVal = sapply(as.vector(t(census[, 12:17])), shannon)`.
✖ `shannonVal` must be size 9467 or 1, not 56802.

The number of rows in my dataset is 9467, so the sapply is returning something that's 6 times as long. But why, and how can I fix it?

mankojag
  • 61
  • 5
  • 1
    When you are using `sapply` on a vector, it loops over each element of the vector and shannon is applied on a single element Maybe you want to loop over columns? Not clear – akrun Nov 29 '22 at 18:18
  • 1
    with a `dput` of some rows of your data or some reprex it'll be easier to help you – Juan C Nov 29 '22 at 18:20
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 29 '22 at 18:22
  • 1
    What about `df$shannon <- apply(df[,12:17], 1, shannon)` ? however you should detail what "census", and in general, you data, is. – Ric Nov 29 '22 at 18:23

2 Answers2

0

Building on Ric's comment, df <- mutate(df, shannon = apply(census[,12:17], 1, function(x) {shannon(t(x)}) might just do the trick

Vons
  • 3,277
  • 2
  • 16
  • 19
0

Ric's answer works: df$shannon <- apply(df[,12:17], 1, shannon)

df and census are the same thing, sorry for the confusion

mankojag
  • 61
  • 5