I am new to R programming. I wanted to tidy up my dataset that's in a csv file. I have a file with a header that consists of multiple columns and rows. I have removed all the rows that have NA's in them. Next, I want to remove all of the rows that have "Unassigned" in their first column. Here is what I have so far in R:
#### # Open the file
data <- read.csv("covid19data.csv", header = TRUE, sep = ",")
#### # Remove all rows that have any number of NA's in them
na_data <- na.omit(data)
#### # Remove rows that have "Unassigned" county name
tidy_data <- na_data[!(na_data$County.Name=="Unassigned"),]
#### # Check to see if "Unassigned" present in dataframe
for(row in tidy_data[,1]) {
if(row == "Unassigned") {
print("PRESENT")
}
}
#### # Check all levels of the county column
county_names <- levels(tidy_data$County.Name)
print(county_names)
I tried checking to see if there were any "Unassigned" in my data by using a for-loop, and it did not print anything so I assumed I did not have any "Unassigned" elements.
When I print(county_names)
, I get "Unassigned" as one of the elements. I thought I removed "Unassigned", but for some reason, it still appears when I print the levels.
What am I doing wrong?
Thank you.