2

I am attempting to use lapply for simple descriptive statistics on a list of lists. here is an example of my code for the list:

varlist <- list(
  datafile$Ho, 
  datafile$Hd, 
  datafile$Vo, 
  datafile$Vd, 
  datafile$TDC, 
  datafile$W, 
  datafile$Ao, 
  datafile$Ad, 
  datafile$Freq)

I create a dataframe to store the new values called descript:

descript <- data.frame(
  mean = as.numeric(), 
  sd = as.numeric(), 
  range = as.numeric(),
  median = as.numeric())

All of that works fine, however as soon as I throw it into lapply I get an issue stating the replacement has 2 rows, data has 1

lapply(varlist,function(x){
  descript$mean <- mean(x,na.rm = TRUE)
  descript$sd <- sd(x,na.rm = TRUE)
  descript$range <- range(x,na.rm = TRUE)
  descript$median <- median(x,na.rm = TRUE)
})

I have looked at other coding questions of the same kind however each answer seems to be application specific. I'm not the greatest at coding admittedly, but if someone could explain what the issue is or how it is generated and give me a solution to my problem I would greatly appreciate it, thanks

TylerT
  • 23
  • 3

2 Answers2

0

I believe your problem is from range(), which outputs 2 numbers rather than one. One way to fix this would be to do something like:

descript$range_a <- range(x,na.rm = TRUE)[1]
descript$range_b <- range(x,na.rm = TRUE)[2]

(I am pretty sure that is your issue, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what datafile looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).

Lucy
  • 981
  • 7
  • 15
  • Thank you very much Lucy for your reply. Like hangler said, it fixed part of the problem but not completely. I do appreciate the help! – TylerT Oct 26 '17 at 14:47
0

I think the problem is that you have initialized an empty data frame, and then are trying to add to it column-wise. R doesn't like that.

In addition, as Lucy points out, range() outputs 2 numbers, so ideally you'll want to capture them each in a separate column.

No idea how efficient it is, but try something like this (adapted from Lucy's answer and this answer on another question):

# Using some sample data
varlist <- list(c(1, 2, 2, 3), c(4, 4, 5, 6), c(7, 8, 9, 10))

tmp <- lapply(varlist, function(x) {
  mean <- mean(x, na.rm = TRUE)
  sd <- sd(x, na.rm = TRUE)
  range_low <- range(x, na.rm = TRUE)[1]
  range_high <- range(x, na.rm = TRUE)[2]
  median <- median(x, na.rm = TRUE)

  data.frame(mean, sd, range_low, range_high, median)
})

descript <- do.call(rbind, tmp)

> descript
  mean        sd range_low range_high median
1 2.00 0.8164966         1          3    2.0
2 4.75 0.9574271         4          6    4.5
3 8.50 1.2909944         7         10    8.5
hangler
  • 390
  • 3
  • 10
  • The cause for failure was that a) failed assignment to col named 'mean', b) no named object was being given a value (which you fixed) and c) that the result returned would not have had any value except the median. – IRTFM Oct 26 '17 at 00:14
  • Your solution worked really well. Thank you very much for the response hangler. 42- thank you for the clarification on the error! – TylerT Oct 26 '17 at 14:47