0

I have a data frame:

            colA           colB
1           15.3           1.76
2           10.8           1.34
3            8.1           1.27
4           19.5           1.47
5            7.2           1.27
6            5.3           1.49
7            9.3           1.31
8           11.1           1.09
9            7.5           1.18
10          12.2           1.22
11           6.7           1.25
12           5.2           1.19
13          19.0           1.95
14          15.1           1.28
15           6.7           1.52
16           8.6             NA
17           4.2           1.12
18          10.3           1.37
19          12.5           1.19
20          16.1           1.05
21          13.3           1.32
22           4.9           1.03
23           8.8           1.12
24           9.5           1.70

How would I be able to remove/change the value of all NAs such that when I use sapply (i.e. sapply(x, mean)), I am taking the mean of 24 rows in the case of colA and 23 columns for colB?

I understand that data frames have to have the same number of rows so using something like na.omit() would not work because it'd remove, in this case, row 16; I'd lose a row of data when I'm calculating the mean for colA.

Thanks!

Jacob L
  • 133
  • 2
  • 18
  • 4
    Try `colMeans(your_data, na.rm = TRUE)` – markus Feb 28 '19 at 12:49
  • I appreciate the responses, and while colMeans() works, the requirement is to use sapply() on the data frame. With this in mind, I'm not sure how I would manipulate the data frame such that NA is changed/removed so that it's not taken into account when calculating the means of each column. – Jacob L Feb 28 '19 at 12:54
  • 1
    Then try `sapply(your_data, mean, na.rm = TRUE)` – markus Feb 28 '19 at 12:55

1 Answers1

0

You should be able to pass na.rm = TRUE and get the mean.

Example:

df <- data.frame(A = 1:3, B = c(NA, 1, 2))
apply(df, 2, mean, na.rm = TRUE)

#   A   B 
# 2.0 1.5 
zx8754
  • 52,746
  • 12
  • 114
  • 209
Sonny
  • 3,083
  • 1
  • 11
  • 19
  • Since I'm using sapply(), I ended using sapply(df, mean, na.rm=T) and it worked. Marking this one as correct as it helped steer me in the right direction. Thanks! – Jacob L Feb 28 '19 at 12:59