I am reposting this question with a bit of more clarity. Unfortunately, didn't get any solutions from my previous posting. Please help me with this.
Below is what I want to do:
I have a dataset with the name of proteome. It has 14 columns and thousands of rows. Row 1, column 5: GHFCLKPGCNFHAESTRGYR Row 2, column 5: FCLKPGCNFHAESTRGYR Row 3, column 5: GHFCLKPGCNFHAESTR Row 4: column 5: GCNFHAESTR
Please click on this link to see the screenshot of a part of the original data frame; i67.tinypic.com/2wd0ap3.png[/IMG]
So, In row 2, first two letters of row 1 are missing; in row 3, last three letters of row 1 are missing; in row 4, first seven and last three letters of row 1 are missing.
Rows 2, 3, and 4 reflect the artifacts of the scientific method I have been using to generate the data, and therefore I want to remove these entries.
I want R to return only one of the four rows, ideally row 1, and remove the rest. The way R can do it is by first finding all rows with a matching string of letters and then eliminating such rows while keeping only one. For example, in the above data set, GCNFHAESTR match in all four rows, so I want R to return me only one row, ideally the top one. But I don't know how to do this.
Hope this makes better sense this time. I look forward to hearing from the experts.
Thanks!