0

I am new to R programming. I wanted to tidy up my dataset that's in a csv file. I have a file with a header that consists of multiple columns and rows. I have removed all the rows that have NA's in them. Next, I want to remove all of the rows that have "Unassigned" in their first column. Here is what I have so far in R:

#### # Open the file

    data <- read.csv("covid19data.csv", header = TRUE, sep = ",")


#### # Remove all rows that have any number of NA's in them

    na_data <- na.omit(data)
 

#### # Remove rows that have "Unassigned" county name

    tidy_data <- na_data[!(na_data$County.Name=="Unassigned"),]


#### # Check to see if "Unassigned" present in dataframe

    for(row in tidy_data[,1]) {
        if(row == "Unassigned") {
            print("PRESENT")
        }
    }

#### # Check all levels of the county column

    county_names <- levels(tidy_data$County.Name)
    print(county_names)

I tried checking to see if there were any "Unassigned" in my data by using a for-loop, and it did not print anything so I assumed I did not have any "Unassigned" elements.

When I print(county_names), I get "Unassigned" as one of the elements. I thought I removed "Unassigned", but for some reason, it still appears when I print the levels.

What am I doing wrong?

Thank you.

1 Answers1

2

We can use droplevels to remove the unused levels and reset the levels

tidy_data <- droplevels(tidy_data)
akrun
  • 874,273
  • 37
  • 540
  • 662