I have a dataset that consists of two columns: one consists of a list of names and the other identifies a group to which the corresponding name belongs (example data and code for reproducing example data below).
group name
1 aaa
1 aaa
1 aaa
2 NA
2 NA
3 NA
3 bbb
4 ccc
4 ccc
5 ddd
5 eee
6 fff
6 ffg
> x <- c(1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6)
> y <- c("aaa", "aaa", "aaa", NA, NA, NA, "bbb", "ccc", "ccc", "ddd", "eee", "fff", "ffg")
> mydataframe <- data.frame( "group" = x, "name" = y)
The dataset consists of more than 500,000 rows (as well as other columns which are irrelevant to this question). Groups may include may have one member (one row) or many members (multiple rows). When a group has multiple members, the names of the members may be: a) the same (e.g., group 1 where all members are named "aaa"); b) blank (e.g., group 2 where all members are NA); c) a mix of names and blanks (e.g., group 3 that includes one blank name and one name "bbb"); or d) different names (e.g., group 5 that has "ddd" and "eee"). When there are different names they may be very different (e.g., group 5) or similar (group 6)
I need to filter the dataset to identify rows that fall into category d - those rows that have the same group number but different (non-NA) names (in the example below, groups 5 and 6).
The filter would produce a new dataset that looks looks like this:
group name
5 ddd
5 eee
6 fff
6 ffg
Bonus points if someone can help me not only produce the result above, but also distinguish between groups that have members with totally different names (group 5) and groups that have members with similar, but different names (group 6).
Thank you in advance!