0

I am have a dataframe that has, among others, two columns: clean.data$bilateral and clean.data$if.bilateral.other.party. Inside the bilateral data, there are three observations: Y, N, and Bilateral (yes, I know the observation is basically the same as the column name and that this is bad. The observation is capitalized, whereas the column name is not).

clean.data <- data.frame("bilateral" = c("Y", "Bilateral", "N", "Y", "Bilateral", "N"),
       "if.bilateral.other.party" = c("Jordan", "Sweeden", NA, "Uk,Netherlands", "Russia,Poland", "NewZealand"))

Treaties that are Bilateral or Y should only have one observation in if.bilateral.other.party, however some don't. For example, Uk,Netherlands should not be listed as Bilateral, instead it should be N. I've already removed spaces in the if.bilateral.other.party column, and there are commas between parties.

I am trying to identify observations that are currently marked as Bilateral or Y that should not be, and change the observation to N in that case. I also need to do the reverse, changing observations N to Y if they have other parties listed.

How do I do this?

  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Dec 12 '20 at 21:26
  • I added one, sorry about that – checkmate363 Dec 12 '20 at 22:13

1 Answers1

0

Without an example data set, this solution is based on speculation and how you've described what you want done.

library("dplyr")
library("stringr")
library("purrr")

clean.data <- tribble(
  ~bilateral, ~if.bilateral.other.party,
  "Y", "UK,Netherlands",
  "Bilateral", "Sweden,France",
  "N", "Germany,UK"
)
clean.data
#> # A tibble: 3 x 2
#>   bilateral if.bilateral.other.party
#>   <chr>     <chr>                   
#> 1 Y         UK,Netherlands          
#> 2 Bilateral Sweden,France           
#> 3 N         Germany,UK

# Split and count countries and assign new bilateral column
clean.data %>%
  mutate(list_countries = str_split(if.bilateral.other.party, ",")) %>%
  mutate(num_countries = map_int(list_countries, function(x) { length(x) })) %>%
  mutate(new_bilateral = case_when(
    num_countries > 1 & bilateral %in% c("Y", "Bilateral") ~ "N",
    num_countries > 1 & bilateral == "N" ~ "Y",
    TRUE ~ bilateral
  ))
#> # A tibble: 3 x 5
#>   bilateral if.bilateral.other.party list_countries num_countries new_bilateral
#>   <chr>     <chr>                    <list>                 <int> <chr>        
#> 1 Y         UK,Netherlands           <chr [2]>                  2 N            
#> 2 Bilateral Sweden,France            <chr [2]>                  2 N            
#> 3 N         Germany,UK               <chr [2]>                  2 Y

Created on 2020-12-12 by the reprex package (v0.3.0)

Here are the results using the sample data you've provided.

clean.data <- data.frame(
  "bilateral" = c("Y", "Bilateral", "N", "Y", "Y", "N"),
  "if.bilateral.other.party" = c("Jordan", "Sweeden", NA, "Uk,Netherlands", "Russia,Poland", "NewZealand"), 
  stringsAsFactors = FALSE)

clean.data %>%
  mutate(list_countries = str_split(if.bilateral.other.party, ",")) %>%
  mutate(num_countries = map_int(list_countries, function(x) { length(x) })) %>%
  mutate(new_bilateral = case_when(
    num_countries > 1 & bilateral %in% c("Y", "Bilateral") ~ "N",
    num_countries > 1 & bilateral == "N" ~ "Y",
    TRUE ~ bilateral
  ))
#>   bilateral if.bilateral.other.party  list_countries num_countries
#> 1         Y                   Jordan          Jordan             1
#> 2 Bilateral                  Sweeden         Sweeden             1
#> 3         N                     <NA>              NA             1
#> 4         Y           Uk,Netherlands Uk, Netherlands             2
#> 5         Y            Russia,Poland  Russia, Poland             2
#> 6         N               NewZealand      NewZealand             1
#>   new_bilateral
#> 1             Y
#> 2     Bilateral
#> 3             N
#> 4             N
#> 5             N
#> 6             N

Created on 2020-12-12 by the reprex package (v0.3.0)

Eric Leung
  • 2,354
  • 12
  • 25