Skip list of list, go straight to list

Question

I'm reading lines from a text file, then inserting them into a list in 2 char pairs
eg hello = ['he','el','ll','lo']. In my current code, the initial read of the lines turns each line into its own list, then puts THAT into a list, giving me a list of lists. Thus the two lines
hello
world
give me the list of lists [[['he','el','ll','lo'], ['wo','or','rl','ld']]. I can take this code and singleList = sum(listlist,[]) which will give me a single list, however this is inefficient due to it iterating over the list twice (and I read a note that said this is a bad way to do it in the first place).

How can I take my code, and input the values from my lines into a single list on the first pass?

def countPairs():
    print ()
    inFile = open("hello.txt", "r")
    n = 2
    linsiz = []
    for line in inFile:
        line.rstrip('\n')
        linsiz.append([line[i:i+2] for i in range(0,len(line),1)])
    print (linsiz)
    singleList = sum(linsiz,[])
    print (singleList)
countPairs()

Whoever suggested the possible duplicate is tryharding to shut down threads. That doesn't even remotely answer my question. I'm quite convinced they didn't even read my post, and instead recognized something that had hello = ['he','el','ll','lo'] and thought "low point poster, lets shut it down". Idiotic.

take a peek at python's [itertools.chain](https://docs.python.org/2/library/itertools.html#itertools.chain) — Aaron, Jan 18 '18 at 21:17
@Aaron oh, that thing is sick. Definitely what I'm looking for. I had thought to stick with just basic python, was trying to find a solution that way. — Podo, Jan 18 '18 at 21:19
Possible duplicate of [How to find overlapping matches with a regexp?](https://stackoverflow.com/questions/11430863/how-to-find-overlapping-matches-with-a-regexp) — r.ook, Jan 18 '18 at 22:11

score 2 · Accepted Answer · answered Jan 18 '18 at 21:19

2

this small change will do it:

for line in inFile:
    line.rstrip('\n')
    for i in range(0,len(line),1):
        linsiz.append(line[i:i+2])

answered Jan 18 '18 at 21:19

user3375448

695
6
14

That's good, but I'd still get rid of the useless stuff (or fix it) and use their goal name `singleList`. – Stefan Pochmann Jan 18 '18 at 21:36
Wouldn't I need line = line.rstrip('\n')? otherwise it doesn't seem like its modifying the line. – Podo Jan 20 '18 at 20:49
ultimately what I ended up using :) This was the most basic fix that I understood without having to look anything up – Podo Jan 20 '18 at 20:55

score 1 · Answer 2 · edited Jan 18 '18 at 21:25

1

There's a standard library entirely devoted to common iteration methods called itertools.

your application sounds the most like it needs itertools.chain.from_iterable():

from itertools import chain
with open('filename.txt') as f:
    for letter in chain.from_iterable(f):
        do_stuff(letter)

edited Jan 18 '18 at 21:25

Christian Dean

22,138
7
54
87

answered Jan 18 '18 at 21:20

Aaron

10,133
1
24
40

score 0 · Answer 3 · answered Jan 18 '18 at 21:17

You are using list comprehension, which returns a list, your append line is appending a new list to your list. You are pushing a new value into your list, BUT the value is a list, so it makes a list of lists.

It seems that what you are wanting to do, is to append the lists to your list so something like:

# linsiz is an array.
linsiz += [line[i:i+2] for i in range(0,len(line),1)]

Which is the same as doing: [] + [], or Concatenating 2 lists together.

score 0 · Answer 4 · answered Jan 18 '18 at 21:19

0

Don't use linsiz.append(), that inserts a new element inside the list. You want to concatenate the new list to the existing list, so do:

linsiz += [line[i:i+2] for i in range(len(line))]

or

for i in range(len(line)):
    linsize.append(line[i:i+2])

answered Jan 18 '18 at 21:19

Barmar

741,623
53
500
612

score 0 · Answer 5 · answered Jan 18 '18 at 21:20

You can simply use extend:

def countPairs():
    print()
    with open("hello.txt") as lines:
        linsiz = []
        for line in lines:
            line = line.rstrip('\n')
            linsiz.extend(line[i:i+2] for i in range(len(line)))
    print(linsiz)

Netwave · Answer 6 · 2018-01-19T06:47:11.290

0

You should extend the list, also you can use only the generator expression, no need to use any intermediate list, just use:

linsiz.extend(line[i:i+2] for i in range(0,len(line),1))

edited Jan 19 '18 at 06:47

answered Jan 18 '18 at 21:21

Netwave

40,134
6
50
93

While I've seen a lot on stack about general expressions, I haven't really been able to understand them fully. I need to look into them a bit more before it makes sense for me to try and use them. Otherwise it's just plain copy and paste which does me no good in the end. I'll look into it though, is it available 3.6 I assume? – Podo Jan 18 '18 at 21:22
why you use a list comprehension then? a list comprehension is just a list constructor that consumes the generator expresion within it. Think of a generator expresion as something that iterates and generates elements one by one, you should iterate trough it to consume it. The generator will be exhausted when it has no more items to compute. – Netwave Jan 18 '18 at 21:25
Isn't that the same thing as this?https://docs.python.org/2/library/itertools.html#itertools.chain – Podo Jan 18 '18 at 21:26
no, it is not the same, but you can use `chain` also for your purpouse – Netwave Jan 18 '18 at 21:27
*"a list comprehension is just a list constructor that consumes the generator expresion within it"* - Not really. But it's similar. – Stefan Pochmann Jan 18 '18 at 22:07
@StefanPochmann, just for him to figure how it works more or less – Netwave Jan 18 '18 at 22:23
Yeah, I definitely agree with you that it's odd to use a list comprehension and be so reserved about generator expressions. Btw I just realized you wrote "expresion" three out of three times. It's "expression". Well, at least better than Podo's "*general* expressions" :-) – Stefan Pochmann Jan 18 '18 at 22:45

score 0 · Answer 7 · answered Jan 18 '18 at 21:22

Here's one way to do it, using an iterable:

def readPairs(file):
    for line in file:
        for i in range(0, len(line) -1):
            yield line[i:i+2]

def countPairs():
    inFile = open("hello.txt", "r")
    singleList = [pair for pair in readPairs(inFile)];
    print(singleList)

countPairs()

score 0 · Answer 8 · answered Jan 18 '18 at 21:28

If you want something to debug and look into, use a small function to give you your chunks for each word:

def splitWord(word):
    """yields your 2-character tuplles from your word"""
    last = word[0]
    for n in word[1:]:  # skips first, thats already in last at start
        yield  last+n
        last = n 

def countPairs():
    print ()
    inFile = open("hello.txt", "r")
    n = 2
    linsiz = []

    for line in inFile:
        line.rstrip('\n')
        linsiz.extend(splitWord(line)) # adds all elements of iterable as elements to list
    print (linsiz) 

countPairs()

score 0 · Answer 9 · 2018-01-19T20:11:29.087

Just put the list outside of function ?

final_list=[]
def countPairs():
    inFile = open("filea.txt", "r")
    for line in inFile:
        for j in range(0,len(line.strip()),1):
            data=line[j:j+2].strip()
            if len(data)==1:
                pass
            else:
                final_list.append(data)
countPairs()

print(final_list)

output:

['he', 'el', 'll', 'lo', 'wo', 'or', 'rl', 'ld']

Skip list of list, go straight to list

9 Answers9