-1

I'm reading a file of data and i have a list that contains the line of the file and it's like this:

>>> oz[:15]
[' 283283283283283283283283284284284284284284284284284284284284284284284284284\n',
' 284284284284284284284284284284284284284284284284284284284284284284284284284\n',
...
' 291291292292292292292292293293293293293293293293293293293293294294294294294\n',
' 294294294294294294294294295295   lat =  -89.5\n']

Now I want to store the numbers in this list in a smart way and I need an element list for every 3 digits, but if I print the output in this way everything is ok:

for ll in range(0,60):
    for k in range(1,73+3,3):
        if k==31 and ((ll+1)%15==0): 
            break                       
        else: 
            print oz[ll][k:k+3]

I got the right output, the numbers 283, 283,... But if I try to store them in a list the result in the list is wrong:

DU = []

# Populate DU array
for ll in range(0,2700):
    for k in range(1,73+3,3):
        if k==31 and ((ll+1)%15==0): 
            break                       
        else: 
            DU.append(oz[ll][k:k+3])

What am I doing wrong filling the list DU?

EDIT: I explain better what I'm trying to achieve: I have a list oz that has this format:

[' 283283283283283283283283284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283282282283283282282282282283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283284284284284\n', ' 284284284284284284284284284284284285285285285285285285285285285285285285285\n', ' 285285286286286286286286286287287287287287287288288288288288288288288288288\n', ' 288289289289289289289289289290290290290290290290290290291291291291291291291\n', ' 291291292292292292292292293293293293293293293293293293293293294294294294294\n', ' 294294294294294294294294295295   lat =  -89.5\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284283283284284284284284284284284284284284284283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283282282282282282282282282282282282282282281\n', ' 281281281281281281281281281281281281281281281281280280280280280280280280279\n', ' 279279279279279279279279279279279279279279278278278278278278278278278278278\n', ' 277277278278278278278278278278278278278278278278278278278278278278278278278\n', ' 278278279279279279279279279279279279279279279279279279279279279279279279279\n', ' 279279280280280280280280280280280280280280280280280280280281281281281281281\n', ' 281282282282282282282282283283283283283283284284284284284284285285285285285\n', ' 286286286287287287287288288288288288288289289289289289290290290290291291291\n', ' 292292292292292292293293293293293293293293293294294294294295295295295295295\n', ' 296296296296296296296297297297   lat =  -88.5\n']

What I need is to fill a list with the triplets of number like ['283', '283', '283', '283'] rembembering that every 15 lines there's a line with the "lat..." text that I want to strip. I hope it's more clear now.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
luca.violino
  • 35
  • 3
  • 8

4 Answers4

2

Your code at the moment seems very hard coded I'm unsure as to what you are trying to achieve but try:

DU = []

for index, line in enumerate(oz):

   line = line.strip() if (index +1) % 15 != 0 else line.strip().split(' ')[0]

   for i in range(0,len(line)-3,3):

      DU.append(line[i:i+3])

or could you try a combination of the answers

 from itertools import izip

 def grouped(iterable, n):
      "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
      return izip(*[iter(iterable)]*n)

 DU = []

 for index, line in enumerate(oz):

        line = line.strip() if (index +1) % 15 != 0 else line.strip().split(' ')[0]


        DU.append(map(''.join, grouped(line.strip(), 3)))
geo_pythoncl
  • 927
  • 1
  • 7
  • 13
  • try this. is this what you're after? – geo_pythoncl Aug 28 '12 at 21:11
  • Uhm this has the same output of my code, with this code works with a small number of lines like 30 but if I use 2700 that is the len(oz) it breaks and repeat the same numbers: `DU = [] for ll in range(0,30): for k in range(1,73+3,3): if k==31 and ((ll+1)%15==0): break else: DU.append(oz[ll][k:k+3]) ` – luca.violino Aug 28 '12 at 21:17
  • slight edit ok i've tried my code over the input you gave and it works – geo_pythoncl Aug 28 '12 at 21:41
  • With mine and your code I got the same output but it's not right, I don't understand if there's some ordering somewhere.. but the numbers don't match with the original oz list... – luca.violino Aug 28 '12 at 21:55
  • are you sure they don't match you realise the input you gve starts at 283 increases to 284 before going back to 283 – geo_pythoncl Aug 28 '12 at 22:00
1

You might be able to use something along the lines of the update in my answer to another question here to group the string of digits in each number. Specifically using this code with the number string as the iterable and a value of 3 for the n (group-size) argument:

from itertools import izip

def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

digits = '283283283283283283283283284284284284284284284284284284284284284284284284284\n'

print map(''.join, grouped(digits.strip(), 3))

Output:

['283', '283', '283', '283', '283', '283', '283', '283', '284', '284', 
'284', '284', '284', '284', '284', '284', '284', '284', '284', '284', 
'284', '284', '284', '284', '284']

I note however that the last line of the data in your example, the:

'294294294294294294294294295295 lat = -89.5\n'

is not simply a string of digits, so it will have to be handled as a special case.

Update:

OK, now that I see the additional information you added to your question, I can provide a complete solution for you based on the grouped() function from my other answer, as I initially suggested. This handles the special line which occurs periodically in your input data by splitting up each line of data and then ignoring all but the first -- often the only -- item of that, which is always a string of digits which are then further processed via my function.

from itertools import izip

def grouped(iterable, n):
    "s -> (s0,s1,s2,...sn-1), (sn,sn+1,sn+2,...s2n-1), (s2n,s2n+1,s2n+2,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

data = [' 283283283283283283283283284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284284284284284284284284284284284283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283282282283283282282282282283283283283283283283283283283283283\n', ' 283283283283283283283283283283283283283283283283283283283283283284284284284\n', ' 284284284284284284284284284284284285285285285285285285285285285285285285285\n', ' 285285286286286286286286286287287287287287287288288288288288288288288288288\n', ' 288289289289289289289289289290290290290290290290290290291291291291291291291\n', ' 291291292292292292292292293293293293293293293293293293293293294294294294294\n', ' 294294294294294294294294295295   lat =  -89.5\n', ' 284284284284284284284284284284284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284283283284284284284284284284284284284284284283\n', ' 283283283283283283283283283283283283283283283283283283283283283283283283283\n', ' 283283283283283283283283283283284284284284284284284284284284284284284284284\n', ' 284284284284284284284284284284284284283283283283283283283283283283283283283\n', ' 283283283283283283283283283283283282282282282282282282282282282282282282281\n', ' 281281281281281281281281281281281281281281281281280280280280280280280280279\n', ' 279279279279279279279279279279279279279279278278278278278278278278278278278\n', ' 277277278278278278278278278278278278278278278278278278278278278278278278278\n', ' 278278279279279279279279279279279279279279279279279279279279279279279279279\n', ' 279279280280280280280280280280280280280280280280280280280281281281281281281\n', ' 281282282282282282282282283283283283283283284284284284284284285285285285285\n', ' 286286286287287287287288288288288288288289289289289289290290290290291291291\n', ' 292292292292292292293293293293293293293293293294294294294295295295295295295\n', ' 296296296296296296296297297297   lat =  -88.5\n']

DU = []
for line in data:
    DU.extend(map(''.join, grouped(line.strip().split()[0], 3)))

print DU

Output:

['283', '283', '283', '283', '283', '283', '283', '283', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '282', '282', '283', '283', '282', '282', '282', '282', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '285', '286', '286', '286', '286', '286', '286', '286', '287', '287', '287', '287', '287', '287', '288', '288', '288', '288', '288', '288', '288', '288', '288', '288', '288', '289', '289', '289', '289', '289', '289', '289', '289', '290', '290', '290', '290', '290', '290', '290', '290', '290', '291', '291', '291', '291', '291', '291', '291', '291', '291', '292', '292', '292', '292', '292', '292', '293', '293', '293', '293', '293', '293', '293', '293', '293', '293', '293', '293', '294', '294', '294', '294', '294', '294', '294', '294', '294', '294', '294', '294', '294', '295', '295', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '283', '283', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '284', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '283', '282', '282', '282', '282', '282', '282', '282', '282', '282', '282', '282', '282', '282', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '281', '280', '280', '280', '280', '280', '280', '280', '280', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '277', '277', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '278', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '279', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '280', '281', '281', '281', '281', '281', '281', '281', '282', '282', '282', '282', '282', '282', '282', '283', '283', '283', '283', '283', '283', '284', '284', '284', '284', '284', '284', '285', '285', '285', '285', '285', '286', '286', '286', '287', '287', '287', '287', '288', '288', '288', '288', '288', '288', '289', '289', '289', '289', '289', '290', '290', '290', '290', '291', '291', '291', '292', '292', '292', '292', '292', '292', '293', '293', '293', '293', '293', '293', '293', '293', '293', '294', '294', '294', '294', '295', '295', '295', '295', '295', '295', '296', '296', '296', '296', '296', '296', '296', '297', '297', '297']

You could make an efficient fairly unreadable list comprehension out of it like this:

from itertools import chain

DU = list(chain.from_iterable(map(''.join, grouped(line.strip().split()[0], 3))
                                             for line in data))
Community
  • 1
  • 1
martineau
  • 119,623
  • 25
  • 170
  • 301
0

How about using regular expressions to match sets of 3 consecutive digits:

import re

def oz_reader(oz):
    for line in oz:
        matches = re.findall(r"\d{3}", line)
        for num in matches:
            yield num

Note, that function returns a generator, rather than a list. If you really need a list with the output, just use the list constructor on it:

result_list = list(oz_reader(oz))
Blckknght
  • 100,903
  • 11
  • 120
  • 169
0

Thank you everybody for the replies, the code was actually working well and I had the problem in another part of my program, I was just tired but thank you for the different possibilities you gave for this code!

luca.violino
  • 35
  • 3
  • 8