0

I want to get to a certain point on a string that is opposite (from the negative side) to that of what I am given.

AAAAAAAAAACCCCCCCCCCTTTTTTTTTTGGGGGGGGGG TTTTTTTTTTGGGGGGGGGGAAAAAAAAAACCCCCCCCCC

So you need to convert coordinates. On the bottom strand, base 0 (the right-most C) is opposed to base 39 on the top strand. Base 1 is against base 38. Base 2 is against case 37. (Important point: notice what happens when you add these two numbers up — every time.) So base 10 is against base 29, and base 19 is against base 20.

So: if I want to find base 10-20 on the bottom strand, I can look at base 20-29 on the top (and then reverse-complement it).

I have written the following:

fp = open(infile, 'r')
for line in fp:
   tokens = line.split()
   exonstarts = tokens[8][:-1].split(',')
   exonends = tokens[9][:-1].split(',')
   zipped = list(zip(exonstarts, exonends))
   chrom_len = len(chr_string)
   s = ''.join(bc[base.upper()] for base in chr_string[-starts-1:-ends-1] for starts, ends in zipped)+'\n'

Yet, every time I do this I get:

Error: global name 'starts' is not defined

How do I fix this??

Peter Hanson
  • 193
  • 2
  • 11
  • What should I define starts as though? – Peter Hanson Apr 28 '12 at 02:18
  • Exonstarts refers to a list though...I want the first element of every pair to be starts such that [(1,2),(3,4),(5,6)] (this list is what I defined as 'zipped') starts would be the 1 then the 3 and 5 while ends would be the other number in the pair – Peter Hanson Apr 28 '12 at 02:20
  • I thought using this code with 'for starts,ends in zipped' did that for me by saying the two elements inside each parenthesis was start,end... – Peter Hanson Apr 28 '12 at 02:22
  • The string I am going through – Peter Hanson Apr 28 '12 at 03:08
  • 1
    it would be super helpful if you went ahead and defined it here and instead of using `line in fp` maybe define fp as a multi-line string and use `for line in fp.split('\n'):` so we can try to run it. – Skylar Saveland Apr 28 '12 at 03:11

3 Answers3

3

Try adding parentheses around the last term:

s = ''.join(bc[base.upper()] for base in (chr_string[-starts-1:-ends-1]\
                                         ^
            for starts, ends in zipped)) +'\n'
                                      ^

Your defining two different generators here. This is equivalent to:

strands = (chr_string[-starts-1:-ends-1] for starts, ends in zipped)
complementary_strands = (bc[base.upper()] for base in stage_1)
joined_exons = ''.join(stage_2) + '\n'
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • I'll give this a try as well and see what I come up with – Peter Hanson Apr 28 '12 at 02:47
  • 2
    Breaking this down into three stages is definitely the way to go (+1), but these stages should have more meaningful names than _1, _2, _3. – johnsyweb Apr 28 '12 at 02:49
  • Alright, sounds good. Does my last one need to be broken down as well, or is it fine as is? if strand == '+': s = ''.join([chr_string[starts:ends] for starts, ends in zipped]) – Peter Hanson Apr 28 '12 at 02:56
  • 1
    Breaking it down is useful for increasing readability and helping you figure out what's wrong with your code. You can keep it as one expression if you want. It doesn't affect the function of the generator. It's up to you – Joel Cornett Apr 28 '12 at 03:01
  • 1
    \ to break the line is not necessary or helpful. – Skylar Saveland Apr 28 '12 at 03:07
  • OK, also it just told me I was trying to a - to a str. So to fix this I just change chr_string[-starts-1:-ends-1] to chr_string[-(int(starts)+1):-(int(ends)+1)] right? – Peter Hanson Apr 28 '12 at 03:11
  • But when I do that^^ I get KeyError: '' What does this mean?? – Peter Hanson Apr 28 '12 at 03:23
  • Could someone tell me if what I am doing is correct please?? – Peter Hanson Apr 28 '12 at 03:34
  • I thought starts and ends would be like "A" and "G" so I think I'm going to give up unless you define everything in the question. @Johnsyweb actually has the right answer for the original question, maybe... at least starts becomes defined .. – Skylar Saveland Apr 28 '12 at 03:38
  • No, starts is the numbers in zipped, the actual number coordinates used on the string – Peter Hanson Apr 28 '12 at 03:46
2

It looks like you are trying to do too much in your generator expression.

The two fors are the wrong way around. You mean:

s = ''.join(bc[base.upper()] for starts,ends in zipped for base in chr_string[-starts-1:-ends-1])+'\n'

Then starts and ends are defined for the second for.

Given the questions you've asked today, I recommend reading a good book, such as Dive Into Python 3 so that you can solve these issues yourself.

Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • Haha, thanks, I'll check it out. All of this is pretty new to me – Peter Hanson Apr 28 '12 at 02:55
  • 1
    @skyl: I made no such implication. The book recommendation was intended to be helpful, it's apparent that Patrick is yet to read a good book on Python. – johnsyweb Apr 28 '12 at 03:29
  • @skyl: Done. Downvotes are for answers that are not useful. Please don't use them for other purposes. – johnsyweb Apr 28 '12 at 03:34
  • @Johnsyweb could you edit this answer to make starts and ends integers, because at the moment they are strings and will not allow the - or the -1 being added to them – Peter Hanson Apr 28 '12 at 03:41
  • ^^I think if you can get this, my program will be functional – Peter Hanson Apr 28 '12 at 03:46
  • 1
    @PatrickCampbell you already know how to make strings into integers with `int(start)`, you did it yourself in a comment to another answer! – weronika Apr 28 '12 at 03:49
  • You're right @weronika. But this did not work. I figured out what I needed to do though. I need to do what we were trying a few days ago and subtract the ends from the len of the string, and then subtract the starts from the length of the string. Do you know a way to do this?? – Peter Hanson Apr 28 '12 at 04:13
  • 1
    @PatrickCampbell: Please stop asking new questions in the comments of arbitrary answers; this is not how this community works. – johnsyweb Apr 28 '12 at 04:34
  • 1
    @PatrickCampbell You really should take Johnsyweb's suggestion and read a python book or tutorial before continuing. It's extremely difficult to help you when you don't seem to understand how the language fits together at all. – weronika Apr 28 '12 at 04:36
1

You're defining exonstarts and then referring to starts, which is not defined.

John Gaines Jr.
  • 11,174
  • 1
  • 25
  • 25
  • Exonstarts refers to a list though...I want the first element of every pair to be starts such that [(1,2),(3,4),(5,6)] starts would be the 1 then the 3 and 5 while ends would be the other number in the pair – Peter Hanson Apr 28 '12 at 02:20
  • it is defined, you probably did the same thing I did at first and didn't scroll to see the end of that long line... – weronika Apr 28 '12 at 03:50