1

below codes purpose is how many times a specific string occurs consecutively in a given string. But I could not understand the logic of [sum(1 for _ in group)+1 for label, group in groups if label==''][0] . I am looking for an explanation. I am writing what I understood so that you can correct me. Any help and explanation is highly appreciated thank you for your time.

from sum(1 for _ in group)+1 : Sum 1s for anything that is in group but I think like group is not defined, I don't know if it is something that comes with the library but it is not colored.

from [sum(1 for _ in group)+1 for label, group in groups if label==''][0] I basically can not follow, if label is a empty string but I don't know about [0] at the end.

from itertools import groupby
checkstr = ['AGATC', 'AATG', 'TATC']
s = 'GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCT'
for c in checkstr:
    groups = groupby(s.split(c))
    try:
        print(c,[sum(1 for _ in group)+1 for label, group in groups if label==''][0])
    except IndexError:
        print(c,0)
    print(sum(1 for _ in group)+1)
IDK
  • 105
  • 6
  • 2
    Sorry but `sum(1 for _ in group)` gave me a good laugh, just do `len(group)` for that one (: – Mandera Jun 15 '20 at 11:29
  • 4
    @Mandera ``group`` is a lazy iterator, it has no ``len``. – MisterMiyagi Jun 15 '20 at 11:29
  • Do you understand what ``[group for label, group in groups]`` would do? Are you familiar with basic comprehensions? – MisterMiyagi Jun 15 '20 at 11:30
  • 1
    @MisterMiyagi [So much for laughing, atleast I learned something!](https://stackoverflow.com/questions/5384570/whats-the-shortest-way-to-count-the-number-of-items-in-a-generator-iterator) – Mandera Jun 15 '20 at 11:46
  • @Mandera its not my code :/ thats why I have hard times toı understand it – IDK Jun 15 '20 at 12:11
  • `group` is defined; `sum(...)` is part of the larger list comprehension that iterates over `groups`, assigning values to `label` and `group` at each step. – chepner Jun 15 '20 at 12:23

3 Answers3

1

I have broken down the list comprehension into a few steps to make the program flow clear. Make sure that you comment out your method when using my method.For some odd reason I couldn't get both methods to work together.

from itertools import groupby
checkstr = ['AGATC', 'AATG', 'TATC']
s = 'GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCT'
for c in checkstr:
    groups = groupby(s.split(c))
    try:
        """
        print(c,[sum(1 for _ in group)+1 for label, group in groups if label==''][0])
        """
        #same as
        my_list = []
        for label, group in groups:
            if label == '':
                for _ in group:
                    my_list.append(1)

        print(c,sum(my_list)+1)

    except IndexError:
        print(c,0)
    #print(sum(1 for _ in group)+1)

I get almost the same output.

But my method gives 1 as the output for 'AGATC'.

I can't get it to break from try and get it into the except.I tried few other methods too.This was the best way i could structure it to make what happens in list comprehension clear.

Hope this helps you clear your doubt.

EDIT

The accuracy of the code kept bothering me because the code you posted in your question returns two words less.This code works perfectly fine.And I have used my analogous form of list comprehension.

from itertools import groupby
checkstr = ['AGATC', 'AATG', 'TATC']
s = 'GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCT'
"""
for c in checkstr:
    groups = groupby(s.split(c))
    try:
        print(c,[sum(1 for _ in group)+1 for label, group in groups if label==''][0])

    except IndexError:
        print(c,0)
    print(sum(1 for _ in group)+1)
"""
for c in checkstr:
    groups = groupby(s.split(c))

    """
    print(c,[sum(1 for _ in group)+1 for label, group in groups if label==''][0])
    """
    #same as
    my_list = []
    for label, group in groups:
        if label == '':
            for _ in group:
                my_list.append(1)


    x= sum(my_list)
    if x == 0:
        print(c,0)
    else:
        print(c,x+2)

OUTPUT

AGATC 0
AATG 44
TATC 6

Proof on word count

AfiJaabb
  • 316
  • 3
  • 11
  • Actually result should be 43 because we are looking for the longest consecutive line of the string so the long line of AATG's is the answer. Thank you so much for help it really helped. – IDK Jun 15 '20 at 14:00
  • Consider upvoting the answers you found useful...someone with the same problem will find it easier – AfiJaabb Jun 15 '20 at 14:05
  • 1
    For sure but one last thing, what is the mission of the group here I mean what it does exactly? @AfiJaabb – IDK Jun 15 '20 at 14:06
  • Yes.It took me a while to break it down.Follow the [link](https://pastebin.com/ABSK3bru).Hope this helps. – AfiJaabb Jun 15 '20 at 15:48
  • Thanks a ton. It helped as hell. It is much more easier to learn code with the help of this kind of community. – IDK Jun 15 '20 at 15:58
0

You might want to read about list comprehensions.

Lets break it down: sum(1 for _ in group)+1 is the same as len(group)+1 if group had a __len__ attribute. If we assume we could do len(group) then we could rewrite this comprehension as: [len(group)+1 for label, group in groups if label==''][0] Lets look at this: [len(group) for label, group in groups], which is just a list of the sizes of every single group in groups. With if label=='' we basically delete all entries from that list, that have an empty string as label. The [0] selects only the first entry.

I words: The size (+1) of the first group that has an empty label.

JoKing
  • 430
  • 3
  • 11
0

You can write out the code to make clear what exactly happens:

from itertools import groupby
checkstr = ['AGATC', 'AATG', 'TATC']
s = 'GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCT'
for c in checkstr:
    slist = s.split(c)
    groups = groupby(slist)
    try:
        # print(c,[sum(1 for _ in group)+1 for label, group in groups if label==''][0])
        nameless_list = []
        for label, group in groups:
            if label=='':
                nameless_list.append(sum(1 for _ in group)+1)
        print(c, nameless_list[0])

    except IndexError:
        print(c,0)
    print(sum(1 for _ in groups)+1)

The list comprehension creates a list. When label is not empty, you are left with an empty list. Then the first element of the list (hence [0] at the end) is printed. This results in an index-error if the list is empty. This error is caught by the exeption handler which prints a 0 instead of the first list element.

Ronald
  • 2,930
  • 2
  • 7
  • 18