R function for finding mean of a population greater than a specific value

Question

I’m currently learning R and I’ve hit a rock. How do I present the code for the proportion of the population taller than 15m. (Question 5 in the picture)

number 5 question in this attachment- for a randomly generated sample in a dataset

Tried using a for loop but can figure out what to input for the vector. I’m expecting a cut-of for values from 15m and above

d<- read.csv('Eucalyptus1_.csv')

str(d)
hight <- (d$hgt)
sample(hight, size = 5)
sample1<-sample(hight, size = 5)
mean(sample1)
median(sample1)
sd(sample1)
quantile(sample1)

This is the line I’m struggling with -

for (sample1 > 15 in vector) { 
  
}

Please show your earnest attempt at the problem. See [How do I ask and answer homework questions?](https://meta.stackoverflow.com/q/334822/1422451) — Parfait, Jan 22 '23 at 00:05
https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Helpful — Philipp Neuber, Jan 22 '23 at 11:09

score 0 · Answer 1 · answered Jan 22 '23 at 14:58

Vectorisation

It is important to understand the vectorised nature of R and that you will almost never need a for loop.

What is a vector? For example, the column hgt in your data is essentially a vector. A variable named hgt containing multiple values.

lets recreate an example vector (a variable named x containig multiple values)

x <- c(1, 2, 3, 4, 5)

Many operations in R are vectorised. This means, they are carried out on each element of the vector simultaneously and there is no need to go through each element one at a time.

Here is an example:

x + 1
# 2 3 4 5 6

As a result, we get another vector, where the operation + 1 was carried out on each element of the original vector.

Therefore, you will not need a for loop.

Just replace the + 1 operation with the appropriate operation for your problem.

What you are looking for is:

to check whether each element in hgt meets a certain condition, for example > 15

The operation "condition check" is done in R via logical operators such as > == or < or <= or >= or != .

Lets find out the values in x that are > 3.

x > 3
# FALSE FALSE FALSE TRUE TRUE

What we get is yet another vector that contains the result of the condition check for each element of x.

Now there is one other concept that is missing. How to extract certain values from a vector.

This is done via the index operator [ ]. For example, if I wanted to extract values that are bigger than 3, I would write x[x > 3]. Read this in your mind as "Give me the values of x where x is bigger than 3".

Sampling Distribution

I want to point out that you are missing an important step that your teacher is wanting you to do. It is to repeat the sampling process + calculation of the demanded statistic for each sample 1000 times, in order to get to a sampling distribution check this out for a real life hands on example why this should even be important.

(Remember that I told you to almost never use a for loop. Maybe it is appropriate to use one to run the same function 1000 times.)

Thanks @uke. This provided a great insight that I needed. Each sample contains 5 values (n=5) and I need (n=5) 1000 times but going with the example in the link you attached, it just replicated same value 1000 times (1000 x5). I used the function rep(sample(d$hgt, size = 5), 1000). So how do I generate 1000 different samples of an experiment with n=5 observation per experiment? — Matthew Aniagu, Jan 23 '23 at 06:34
This is because `rep()` does not really evaluate its content 1000 times. As you pointed out, it rather calculates the content once and repeats the result 1000 times. A readymade function for repeating sampling n times is `moderndive::rep_sample_n()`. But I guess your teacher wants you to write your own approach, so I would suggest to go with a `for`-loop. — uke, Feb 01 '23 at 19:41

R function for finding mean of a population greater than a specific value

1 Answers1

Vectorisation

Sampling Distribution