two columns of correlated pairs: remove rows of 'duplicates' to collapse dataframe into groups? (hard to describe)

Question

I have a large dataset looking something like this: df<- read.table(text="Var1 Var2 K1 K2 K3 K2 K7 K2 K7 K3 K5 K9 K4 K9", header=TRUE, stringsAsFactors=FALSE)

These are all pairs with a correlation of 1, and I'm looking to group them into clusters in order to collapse a larger dataset later. Is there a simple way of removing rows like K7 K3 because they are part of the K2 group. I want to be able to group rows later based on column 2, so I don't want any 'duplicates' of like a K3 group for example.

Edit: expected output

K1       K2
K3       K2
K7       K2
K5       K9
K4       K9```

@IceCreamToucan this question is different from the one you associated it with, I did use that answer to remove duplicates like K1 K2 vs. K2 K1, and now this is a different question so can you remove that? — rholeepoly, Feb 03 '20 at 15:29
would the solution just be to remove the duplicates in column 1 possibly? — rholeepoly, Feb 03 '20 at 15:40

score 0 · Accepted Answer · answered Feb 03 '20 at 15:44

0

ok i think i answered my own question with: newdf<-df[!duplicated(df$Var1),]

answered Feb 03 '20 at 15:44

rholeepoly

43
3

two columns of correlated pairs: remove rows of 'duplicates' to collapse dataframe into groups? (hard to describe)

1 Answers1