I couldn't find a question here on Stack overflow that already answers my question so I'm very sorry if this has already been asked and I just couldn't find it.
All in all, this question is more about understanding what happens with my data depending what code I use.
So, I have a dataset with a few NAs in it.
I want to aggregate the data and use na.rm=True
which tells R to ignore the NAs while calculating, right?
The output I received included NAs and this lead to me using the function na.action=na.pass
together with na.rm=True
.
This left me with significantly less NAs in my output.
To be honest I don't understand why...
As I like to try out and find out for myself, I looked at different variations of my aggregate function:
- only with
na.rm=True
- only with
na.action=na.pass
na.rm=True
,na.action=na.pass
only using 2. I get a lot of NAs, which makes sense because I told R to include all NAs in the calculation without having na.rm=True
in it.
At the same time 1. and 3. don't give me the same results. why is that?
I thought that the two na.rm=True
and na.action=na.pass
mean the same thing... apparently they don't, because I get slightly different values for my variables' means.
What happens with my data when I use both na.rm=True and na.action=na.pass
in an aggregate function, compared to only using na.rm=True
. Which is better to be used?
Thank you very much, I appreciate your help!