0

The title can be confusing but I guess it has a simple solution. I have my own function and I want to apply same function to multiple lists that consists of two columns. But I need to do different calculations to each column separately.

As an example mydata is:

    x1   x2   y1   y2   z1  z2
1  0.0  0.0  0.0  7.8  0.0 8.6
2  8.6  0.0  0.0  7.6  1.6 1.4
3 11.2  7.8  3.4  1.2  7.6 0.0
4  8.4  7.6 21.4 10.2 23.6 0.0
5  0.0  1.2  1.8  7.0  3.2 0.0
6  0.0 10.2  1.4  0.0  0.0 0.0

mydata<-structure(list(x1 = c(0, 8.6, 11.2, 8.4, 0, 0), x2 = c(0, 0, 
7.8, 7.6, 1.2, 10.2), y1 = c(0, 0, 3.4, 21.4, 1.8, 1.4), y2 = c(7.8, 
7.6, 1.2, 10.2, 7, 0), z1 = c(0, 1.6, 7.6, 23.6, 3.2, 0), z2 = c(8.6, 
1.4, 0, 0, 0, 0)), .Names = c("x1", "x2", "y1", "y2", "z1", "z2"
), class = "data.frame", row.names = c(NA, -6L))

And myfun function is:

        myfun<- function(x) {
  means<-sapply(list(x), function(ss) mean(ss, na.rm = T))
  #my point: vars<-sapply(list(y), function(ss) var(ss, na.rm = T))
  mean<-means[[1]]
  #var<-vars[[1]]
  #lists<-list(mean, var)
  #names(lists) <- c("mean", "var")
  #return(lists)
  lists<-list(mean)    
  names(lists)<-c("mean")
  return(lists)
}

I used #for parts that will be added later in the myfun.

When I tried

results<-lapply(mydata, myfun)

I can apply same function and same calculation to each column.

As you see there are 2 columns(x1-x2, y1-y2, z1-z2) for each data (x, y, z).

What I want is:

1) Obtaining means of first columns (x1, y1, z1)

2) Obtaining variances of second columns (x2, y2, z2)

3) And as output; I want to see results of mean1and var1for each data under x, y and z lists like:

x-> mean1 (mean of x1)
    var1  (var of x2)

y-> mean1 (mean of y1)
    var1  (var of y2)

4) Do all these in a loop with lapply or sapply or with any useful function.

Notes:

1) I did not group x1 and x2 under x, y1 and y2 under y. Because If a solution can be found for mydata form, it would be more useful for me. But if it is necessary I can group them separately.

2) myfun function is finding means of 6 columns now. I have indicated the additional parts that will be used to calculate variances of second columns with #

2 Answers2

1

I would start by splitting the dataframe to create a list of dataframes with 2 columns. At the point you can use lapply or map_dfr to apply the function mean_var to each of the elements of the list. The advantage of map_dfr is that it return a dataframe, binding the rows of the function output.

library(purrr)

my_data_l <- split.default(mydata, rep(1:3, each = 2))

mean_var <- function(x) {
    list(mean = mean(x[,1]), var = var(x[,2]))
}

map_dfr(my_data_l, mean_var)

c1au61o_HH
  • 867
  • 7
  • 14
  • 2
    A little explanation can help new and future useRs. And why `map_dfr` when `lapply` could work? Lightweight is the best weight! – Parfait Apr 19 '19 at 17:33
  • Good point, I will edit the answer. Indeed ```lapply``` would work too, ```map_dfr``` is convenient because it returns a dataframe by binding the rows of the output of the function – c1au61o_HH Apr 19 '19 at 17:43
  • 1
    By the way, your solution takes assumptions on column ordering. What if it were `x2 y2 z2 x1 y1 z1`? Maybe a sort is needed? – Parfait Apr 19 '19 at 17:44
  • 1
    Yes, that is a good point too. In that case I would ```split``` by columns names as shown here in the second [answer](https://stackoverflow.com/questions/51297089/how-to-split-data-frame-by-column-names-in-r) – c1au61o_HH Apr 19 '19 at 17:52
  • 1
    I am appreciate you both. Solutions will be useful for even further studies. – Hüsamettin Tayşi Apr 19 '19 at 17:56
1

Consider assigning your groups first, then iterate off this with lapply. In fact use sapply with simplify=FALSE for a named list.

grps <- unique(gsub("[0-9]", "", colnames(mydata)))
# [1] "x" "y" "z"

myfun <- function(grp)
             list(mean = mean(mydata[,paste0(grp, 1)]),
                  variance = var(mydata[,paste0(grp, 2)]))  

mean_var_list <- sapply(grps, myfun, simplify = FALSE)    

mean_var_list
# $x
# $x$mean
# [1] 4.7
# 
# $x$variance
# [1] 20.87467
# 
# $y
# $y$mean
# [1] 4.666667
# 
# $y$variance
# [1] 16.53467
# 
# $z
# $z$mean
# [1] 6
# 
# $z$variance
# [1] 11.85067

Or use the default, simplify=TRUE and return a matrix.

mean_var_mat <- sapply(grps, myfun)

mean_var_mat
#          x        y        z       
# mean     4.7      4.666667 6       
# variance 20.87467 16.53467 11.85067
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks a lot. This worked. But I have another question. For my calculations I will import data named 2005,2006.. etc. And for a session I will import 14 years data (from 2005 to 2018). What should I do for this kind naming? – Hüsamettin Tayşi May 07 '19 at 07:53
  • Please ask a new question with all necessary details including sample data, current attempted code, and desired results. Thanks! – Parfait May 07 '19 at 17:32