I am using the Titanic Data from Kaggle. I am trying to find the number of missing values in each column using a simple function.
I was able to find the number of missing values for each column using the code below:
length(which(is.na(titanic_data$PassengerId)))
length(which(is.na(titanic_data$Survived)))
length(which(is.na(titanic_data$Pclass)))
length(which(is.na(titanic_data$Name)))
length(which(is.na(titanic_data$Sex)))
length(which(is.na(titanic_data$Age)))
length(which(is.na(titanic_data$SibSp)))
length(which(is.na(titanic_data$Parch)))
length(which(is.na(titanic_data$Ticket)))
length(which(is.na(titanic_data$Fare)))
length(which(is.na(titanic_data$Cabin)))
length(which(is.na(titanic_data$Embarked)))
I did not want to be repeating code for each column. So I wrote the following function:
missing_val<- function(x,y){
len <-length(which(is.na(x$y)))
len
}
#create a list of all column names
cols<- colnames(titanic_data)
cols
#call the function
missing_val(titanic_data,cols)
I keep getting a singular zero when executing missing_val
function, when I know for a fact that there are missing values in Cabin and Embarked columns.
What I am trying to get is something like, 0,0,0,0,0,0,0,0,687,2 indicating the fact that there are 687 missing variables in Cabin column and 2 missing in Embark column.
What am I doing wrong here? Any hint would be appreciated. Thx