0

Hello research community members who are using R!

If I may ask for your support in a simple question that I am stuck with.

I have a data.frame with the following structure:

Date CountryName Variable 2 Variable 3

I need somehow to get a data.frame that contains all records for specific Country_1, based upon condition that Variable 3 == NA.

In other words, if a record contains NA in Variable 3 for Country_1 then the output would return all records for this Country_1 in this data set. The same condition is expected to be applied to all records in this data set so that, at the end, I've got a data.frame of all records of Country_1 ... Country_N provided that there is at least one NA in Variable 3 for these countries. Those countries whose records don't contain NAs in Variable 3 have not been included in the output.

I tried using if_else condition, but it didn't work out, so due to limited functional programming skills I decided to post this question here.

Thanks a lot.

dkolkin
  • 81
  • 10
  • 3
    You can find a bunch of approaches if you do a simple google search [Select records in data set based upon condition(s) in R](https://www.google.com/search?client=safari&rls=en&q=Select+records+in+data+set+based+upon+condition(s)+in+R&ie=UTF-8&oe=UTF-8). Could you edit your question to include why these solutions dont work? Good luck! – jpsmith Aug 22 '23 at 14:34
  • something like `mydata[Countryname %in% unique(mydata[is.na(mydata$Variable3), ]$Countryname), ]` could work – Wimpel Aug 22 '23 at 14:35
  • Country_1 == CountryName or is it a lookup vector with a subset of countries? – Andre Wildberg Aug 22 '23 at 14:35
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show your code so we can see what might be wrong. – MrFlick Aug 22 '23 at 14:45
  • `is.na` and `complete.cases` go a long way – I_O Aug 22 '23 at 15:08

1 Answers1

2

Inferring dplyr (from your mention of if_else), this will return all countries for which one or more of `Variable 3` are NA:

df %>%
  filter(any(is.na(`Variable 3`)), .by = CountryName)

(If you have dplyr older than 1.1, you'll need group_by(CountryName) and remove the .by= arg.)

Demonstrated using mtcars:

library(dplyr)
mtcars$disp[2:3] <- NA
mtcars %>%
  filter(any(is.na(disp)), .by = cyl)
#                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag  21.0   6    NA 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710     22.8   4    NA  93 3.85 2.320 18.61  1  1    4    1
# Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
# Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
# Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
# Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
# Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
# Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
# Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
# Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
# Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
# Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
# Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
# Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Noting that only 4 and 6 are returned, 8 (with no NA values) are filtered out.

r2evans
  • 141,215
  • 6
  • 77
  • 149