I want to keep the rows that have the same elements in a dataframe that are present in two given columns such as
df <- data.frame(BGC1 = c("BGC1", "BGC1", "BGC1", "BGC2", "BGC2", "BGC2", "BGC3", "BGC3", "BGC3", "BGC4", "BGC4", "BGC4"),
BGC2 = c("BGC2", "BGC3", "BGC4", "BGC1", "BGC3", "BGC4", "BGC1", "BGC2", "BGC4", "BGC1", "BGC2", "BGC3"),
Family1 = c("Strepto_10","Strepto_20","Strepto_30", "Strepto_20","Strepto_20", "Strepto_50", "Strepto_20", "Strepto_30", "Strepto_30", "Strepto_30", "Strepto_50", "Strepto_40")
, Family2 = c("Strepto_10","Strepto_10","Strepto_10", "Strepto_20","Strepto_20", "Strepto_20", "Strepto_30", "Strepto_30", "Strepto_30", "Strepto_40", "Strepto_40", "Strepto_40"))
Example DF
BGC1 | BGC2 | Bacteria1 | Bacteria2
BGC1 BGC2 Strepto_10 Strepto_10
BGC1 BGC3 Strepto_20 Strepto_10
BGC1 BGC4 Strepto_30 Strepto_10
BGC2 BGC1 Strepto_20 Strepto_20
BGC2 BGC3 Strepto_20 Strepto_20
BGC2 BGC4 Strepto_50 Strepto_20
BGC3 BGC1 Strepto_20 Strepto_30
BGC3 BGC2 Strepto_30 Strepto_30
BGC3 BGC4 Strepto_30 Strepto_30
BGC4 BGC1 Strepto_30 Strepto_40
BGC4 BGC2 Strepto_50 Strepto_40
BGC4 BGC3 Strepto_40 Strepto_40
I would want to keep those where Family1 and Family2 are the same for example
Expected Output
BGC1 | BGC2 | Bacteria1 | Bacteria2
BGC1 BGC2 Strepto_10 Strepto_10
BGC2 BGC1 Strepto_20 Strepto_20
BGC2 BGC3 Strepto_20 Strepto_20
BGC3 BGC2 Strepto_30 Strepto_30
BGC3 BGC4 Strepto_30 Strepto_30
BGC4 BGC3 Strepto_40 Strepto_40