0

SO I have this list of terms

[('GO:0090141', 1), ('GO:0030308', 1), ('GO:0000266', 1), ('GO:0016881', 1), ('GO:0031307', 1)]
[('GO:0050681', 1), ('GO:0031491', 1), ('GO:0008270', 1), ('GO:0003677', 1), ('GO:0070936', 1)]
[('GO:0050681', 1), ('GO:0031491', 1), ('GO:0008270', 1), ('GO:0003677', 1), ('GO:0070936', 1)]
[('GO:0050681', 1), ('GO:0031491', 1), ('GO:0008270', 1), ('GO:0003677', 1), ('GO:0070936', 1)]
[('GO:0016055', 1), ('GO:0016363', 1), ('GO:0008270', 1), ('GO:0003676', 1), ('GO:0003677', 1)]
[('GO:0016607', 1), ('GO:0016605', 1), ('GO:0006351', 1), ('GO:0005515', 1), ('GO:0016925', 1)]
[('GO:0045842', 1), ('GO:0000781', 1), ('GO:0019789', 1), ('GO:0007067', 1), ('GO:0007049', 1)]
[('GO:0016607', 1), ('GO:0016605', 1), ('GO:0006351', 1), ('GO:0005515', 1), ('GO:0016925', 1)]
[('GO:0006457', 1), ('GO:0019221', 1), ('GO:0000087', 1), ('GO:0010827', 1), ('GO:0042405', 1)]
[('GO:0003676', 1), ('GO:0008270', 1), ('GO:0019789', 1)]
[('GO:0019221', 1), ('GO:0050681', 1), ('GO:0019899', 1), ('GO:0008270', 1), ('GO:0003676', 1)]
[('GO:0008022', 1), ('GO:0019048', 1), ('GO:0019899', 1), ('GO:0016881', 1), ('GO:0045202', 1)]
[('GO:0008022', 1), ('GO:0019048', 1), ('GO:0019899', 1), ('GO:0016881', 1), ('GO:0045202', 1)]
[('GO:0016881', 1), ('GO:0016874', 1), ('GO:0019789', 1)]

How can I get a list just 'GO' info without getting the '1'....so I just want GO:000221,GO:000241,etc... Also can you help me get the frequency of the GO Terms...etc.....like for GO:0008270...it appears 7 times on my list

  • Is this a file you are reading in? Or an object that is a list of list (of tuples). – Nix Mar 18 '13 at 01:02

3 Answers3

2

You have some lists of tuples, so you can extract the second element of each tuple using a list comprehension (suppose your list is called l):

 g = [e[0] for e in l]

Once you have the list of just the GO terms, you can get their frequencies using, for example (see https://stackoverflow.com/a/893499),

 from collections import Counter
 freqs = Counter(g)
Community
  • 1
  • 1
Adam Obeng
  • 1,512
  • 10
  • 13
1

list_of_GO = [item[0] for item in old_list]

what you have is a list of 2-tuples (GO:XXX, 1). above list comprehension builds new list with only the first item(GO:XXX) of every tuple in old_list.

it seems like you have bunch of lists, not just a single list. can we see more of your code?

to count the frequency of your code, you can use collections.Counter or you can count things yourself, by doing list.count(item)

thkang
  • 11,215
  • 14
  • 67
  • 83
0

Assuming that data is setup in "rows", so to speak, you could use a Counter object from the collections module.

>>> from collections import Counter
>>> counter = Counter()
>>> data = [
        # Each row of data listed above
        [('GO:0090141', 1), ('GO:0030308', 1), ('GO:0000266', 1), ('GO:0016881', 1), ('GO:0031307', 1)],
        # Etc...
>>> for row in data:
...     counter.update(x[0] for x in row)
...
>>> print counter
Counter({'GO:0008270': 6, 'GO:0050681': 4, 'GO:0003677': 4, 'GO:0016881': 4, 'GO
:0019899': 3, 'GO:0031491': 3, 'GO:0003676': 3, 'GO:0070936': 3, 'GO:0019789': 3
, 'GO:0008022': 2, 'GO:0019221': 2, 'GO:0045202': 2, 'GO:0016607': 2, 'GO:001660
5': 2, 'GO:0019048': 2, 'GO:0016925': 2, 'GO:0006351': 2, 'GO:0005515': 2, 'GO:0
045842': 1, 'GO:0006457': 1, 'GO:0030308': 1, 'GO:0000266': 1, 'GO:0000087': 1,
'GO:0031307': 1, 'GO:0007067': 1, 'GO:0007049': 1, 'GO:0090141': 1, 'GO:0016363'
: 1, 'GO:0000781': 1, 'GO:0016874': 1, 'GO:0016055': 1, 'GO:0010827': 1, 'GO:004
2405': 1})
Nitzle
  • 2,417
  • 2
  • 22
  • 23