Vectorisation
It is important to understand the vectorised nature of R and that you will almost never need a for
loop.
What is a vector? For example, the column hgt
in your data is essentially a vector. A variable named hgt
containing multiple values.
lets recreate an example vector (a variable named x containig multiple values)
x <- c(1, 2, 3, 4, 5)
Many operations in R are vectorised. This means, they are carried out on each element of the vector simultaneously and there is no need to go through each element one at a time.
Here is an example:
x + 1
# 2 3 4 5 6
As a result, we get another vector, where the operation + 1
was carried out on each element of the original vector.
Therefore, you will not need a for
loop.
Just replace the + 1
operation with the appropriate operation for your problem.
What you are looking for is:
- to check whether each element in
hgt
meets a certain condition, for example > 15
The operation "condition check" is done in R via logical operators such as >
==
or <
or <=
or >=
or !=
.
Lets find out the values in x that are > 3
.
x > 3
# FALSE FALSE FALSE TRUE TRUE
What we get is yet another vector that contains the result of the condition check for each element of x
.
Now there is one other concept that is missing. How to extract certain values from a vector.
This is done via the index operator [ ]
.
For example, if I wanted to extract values that are bigger than 3, I would write x[x > 3]
. Read this in your mind as "Give me the values of x where x is bigger than 3".
Sampling Distribution
I want to point out that you are missing an important step that your teacher is wanting you to do. It is to repeat the sampling process + calculation of the demanded statistic for each sample 1000 times, in order to get to a sampling distribution check this out for a real life hands on example why this should even be important.
(Remember that I told you to almost never use a for
loop. Maybe it is appropriate to use one to run the same function 1000 times.)