R computing mean and median from local file

Question

I'm brand new to R, so this might be obvious.

The code I have so far:

rm(list=ls())

kdata = read.table("data_fra_klassen_v20.txt",header = TRUE,)

library(openxlsx)

kdata = read.xlsx("data_fra_klassen_v20.xlsx") 


head(kdata)

Here is the dataset:

  gender    shoe    height  colour
  Man        43      176    Green
  Woman      36      166    Brown
  Man        43      182    Other
  Man        36      151    Brown
  Woman      43      183    Blue
  Man        44      184    Blue
  Woman      38      164    Brown
  Woman      37      160    Brown
  Man        41      175    Brown

I'm trying to find the mean, and median within each gender.

I was thinking maybe something like this:

heightmen = kdata$height[kdata$gender=="Man"]
mean(heightmen)

However, it seems like it can't find any values.

What do you mean it can't find any values? I don't see why not—your code gets me 173.6 — camille, Jan 21 '20 at 15:02
Actually your code should work fine. Check if your columns are in the right format, especially `class(kdata$height)` should yield `"numeric"` or `"integer"`. As a workaround try `as.numeric(as.character(kdata$height))[kdata$gender=="Man"]`. — jay.sf, Jan 21 '20 at 15:04
@jay.sf Someone gave me another solution, however, I'll experiment with what you said so I can get a better understanding of r. — Ben, Jan 21 '20 at 15:08

sm925 · Accepted Answer · 2020-01-21T15:07:44.397

1

You can do it using dplyr package in R:

Using mutate:

library(dplyr)
df %>% 
  group_by(gender)%>% 
  mutate(mean_height = mean(height))%>% 
  mutate(median_height = median(height)) %>% 
  select(gender, mean_height, median_height) %>% 
  unique()

Or using summarise:

df %>%
  group_by(gender) %>%
  summarise(mean_height = mean(height), median_height = median(height))


# A tibble: 2 x 3
# Groups:   gender [2]
#   gender mean_height median_height
#<fct>        <dbl>         <dbl>
#1 Man           174.           176
#2 Woman         168.           165

data

df <- structure(list(gender = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L), .Label = c("Man", "Woman"), class = "factor"), shoe = c(43L,36L, 43L, 36L, 43L, 44L, 38L, 37L, 41L), height = c(176L, 166L,182L, 151L, 183L, 184L, 164L, 160L, 175L), colour = structure(c(3L,2L, 4L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Blue", "Brown", "Green", "Other"), class = "factor")), class = "data.frame", row.names = c(NA,-9L))

edited Jan 21 '20 at 15:07

answered Jan 21 '20 at 14:56

sm925

2,648
1
16
28

@Ben use this first `install.packages("tidyverse")` then run above code.. – sm925 Jan 21 '20 at 14:59
I think it might be better to use `summarize` instead of `mutate` to obtain the mean by gender. – Jonathan V. Solórzano Jan 21 '20 at 15:03
That seems to have fixed it. – Ben Jan 21 '20 at 15:03
I'll see if that's better @JonathanV.Solórzano – Ben Jan 21 '20 at 15:05

score 1 · Answer 2 · answered Jan 21 '20 at 15:00

1

Other solution similar to your previous code, but using subset.

mean(subset(kdata,gender == "Man")$height)
mean(subset(kdata,gender == "Woman")$height)

answered Jan 21 '20 at 15:00

Jonathan V. Solórzano

4,720
10
22

score 0 · Answer 3 · answered Jan 21 '20 at 15:08

0

try this

library(dplyr)
kdata  %>% group_by(gender) %>% summarise(median = quantile(height, 0.5), mean = mean(height))

answered Jan 21 '20 at 15:08

Gonzalo Falloux Costa

372
1
12

R computing mean and median from local file

3 Answers3

data