I've got a df (300000 obs by 15 var). A simplified version looks like this:
Col1 Col2
1 TRGV3*01 YCAVWIGSTSWVKIFA
2 TRGV7*02 YCASWAGQSGSGFHKVFA
3 TRGV7*02 YCASWAGRGSGFHKVFA
4 TRGV3*01 YCAVWINLGTSWVKIFA
5 TRGV3*01 YCAVWSGTSWVKIFA
6 TRGV7*02 YCASWAGRESSGFHKVFA
7 TRGV7*02 YCASWAVYSSGFHKVFA
8 TRGV7*02 YCASWSSSGFHKVFA
9 TRGV2*01 YCAVWICGTSWVKIFA
I want to subset the df and create a new df such that for a particular value in col1 (TRGV7*02), all possible values of Col2 are subsetted and the output is something like this:
TRGV7*02 TRGV3*01 TRGV2*01
YCASWAGQSGSGFHKVFA YCAVWIGSTSWVKIFA YCAVWICGTSWVKIFA
YCASWAGRGSGFHKVFA YCAVWSGTSWVKIFA
YCAVWINLGTSWVKIFA
YCASWAGRESSGFHKVFA
YCASWAVYSSGFHKVFA
YCASWSSSGFHKVFA
Once I do this I need to do some analysis and I can't figure out how to do that either... 1. Spit out another df from the one just created above with the same format but only with all unique values in that particular column (lot of repeats in the original dataset) 2. Compare the values generated in each column above and figure out which values in each column are unique compared to the values in all the columns or when just trying to compare it with one other column.
Hope that made sense. Thank you so much in advance!