0

I am cleaning a dataset using R with a country abbreviation code attribute. I want to check the validity of each value in that column by matching up with a list of county abbreviation. How can I do this with R? I am a beginner to R.

Following is a sample data set

enter image description here

Thanks in advance!

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Hasangi
  • 280
  • 7
  • 17
  • Do you have a list of county abbreviation to match it up with? Do you want to remove the rows which do not match? – Ronak Shah May 10 '19 at 07:43
  • So, one of the attributes is Country.Code and you want to validate the values of that attribute, for every ID? If yes, do you want them to match with which values? – Gonçalo Peres May 10 '19 at 07:45
  • 1
    `df$County.Code %in% list_of_county_abbreviation` – Jaap May 10 '19 at 07:47
  • Possible duplicate of [Filter data.frame rows by a logical condition](https://stackoverflow.com/questions/1686569/filter-data-frame-rows-by-a-logical-condition) – camille May 10 '19 at 14:06

1 Answers1

0

Using anti_join() from dplyr to filter your dataset for values that are not in another dataset, you can do the following:

library(tibble) # for tibble (a data_frame)
library(dplyr)  # for anti_join

# Create some data
df <- tibble(
  country = c("ABC", "DEF", "GHI", "WRONG"),
  other_data = rnorm(4)
)

df
#> # A tibble: 4 x 2
#>   country other_data
#>   <chr>        <dbl>
#> 1 ABC         -0.277
#> 2 DEF          1.09 
#> 3 GHI         -0.184
#> 4 WRONG       -0.150

countries <- tibble(
  country = c("ABC", "DEF", "GHI", "JKL", "MNO"),
  name = c("some", "long", "names", "or", "so")
)
countries
#> # A tibble: 5 x 2
#>   country name 
#>   <chr>   <chr>
#> 1 ABC     some 
#> 2 DEF     long 
#> 3 GHI     names
#> 4 JKL     or   
#> 5 MNO     so

# get only the countries that are NOT in countries
anti_join(df, countries, by = "country")
#> # A tibble: 1 x 2
#>   country other_data
#>   <chr>        <dbl>
#> 1 WRONG       -0.150

Created on 2019-05-10 by the reprex package (v0.2.1)

David
  • 9,216
  • 4
  • 45
  • 78