1

I'm brand new to R, so this might be obvious.

The code I have so far:

rm(list=ls())

kdata = read.table("data_fra_klassen_v20.txt",header = TRUE,)

library(openxlsx)

kdata = read.xlsx("data_fra_klassen_v20.xlsx") 


head(kdata)

Here is the dataset:

  gender    shoe    height  colour
  Man        43      176    Green
  Woman      36      166    Brown
  Man        43      182    Other
  Man        36      151    Brown
  Woman      43      183    Blue
  Man        44      184    Blue
  Woman      38      164    Brown
  Woman      37      160    Brown
  Man        41      175    Brown

I'm trying to find the mean, and median within each gender.

I was thinking maybe something like this:

heightmen = kdata$height[kdata$gender=="Man"]
mean(heightmen)

However, it seems like it can't find any values.

Ben
  • 79
  • 8
  • What do you mean it can't find any values? I don't see why not—your code gets me 173.6 – camille Jan 21 '20 at 15:02
  • Actually your code should work fine. Check if your columns are in the right format, especially `class(kdata$height)` should yield `"numeric"` or `"integer"`. As a workaround try `as.numeric(as.character(kdata$height))[kdata$gender=="Man"]`. – jay.sf Jan 21 '20 at 15:04
  • @jay.sf Someone gave me another solution, however, I'll experiment with what you said so I can get a better understanding of r. – Ben Jan 21 '20 at 15:08

3 Answers3

1

You can do it using dplyr package in R:

Using mutate:

library(dplyr)
df %>% 
  group_by(gender)%>% 
  mutate(mean_height = mean(height))%>% 
  mutate(median_height = median(height)) %>% 
  select(gender, mean_height, median_height) %>% 
  unique()

Or using summarise:

df %>%
  group_by(gender) %>%
  summarise(mean_height = mean(height), median_height = median(height))


# A tibble: 2 x 3
# Groups:   gender [2]
#   gender mean_height median_height
#<fct>        <dbl>         <dbl>
#1 Man           174.           176
#2 Woman         168.           165

data

df <- structure(list(gender = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L), .Label = c("Man", "Woman"), class = "factor"), shoe = c(43L,36L, 43L, 36L, 43L, 44L, 38L, 37L, 41L), height = c(176L, 166L,182L, 151L, 183L, 184L, 164L, 160L, 175L), colour = structure(c(3L,2L, 4L, 2L, 1L, 1L, 2L, 2L, 2L), .Label = c("Blue", "Brown", "Green", "Other"), class = "factor")), class = "data.frame", row.names = c(NA,-9L))
sm925
  • 2,648
  • 1
  • 16
  • 28
1

Other solution similar to your previous code, but using subset.

mean(subset(kdata,gender == "Man")$height)
mean(subset(kdata,gender == "Woman")$height)
0

try this

library(dplyr)
kdata  %>% group_by(gender) %>% summarise(median = quantile(height, 0.5), mean = mean(height))