I have a text file separated by tabs and newlines. the first column contains sample IDs but these are duplicated:
1/16 info info info
1/16 info info info
2/16 info info info
2/16 info info info
2/16 info info info
3/16 info info info
3/16 info info info
I need to extract the first column of the IDs so I end up with a single column i.e-
1/16
2/16
3/16
I have managed to extract the column but I am having difficulty with removing the duplicates? Here is what I have:
path = ./Documents/*txt
for filename in glob.glob(path):
my_file = open(filename, 'r+')
for line in my_file:
line = line.split('\t')
id = line[0]
print id
I have tried using another list and adding in the IDs and then
s=[]
if id not in s:
s.append(id)
But i am stuck on how to remove the duplicates from here.