1

Basically I want to read strings from a text file, put them in lists three by three, and then put all those three by three lists into another list. Actually let me explain it better :)

Text file (just an example, I can structure it however I want):

party    
sleep  
study    
--------   
party   
sleep  
sleep    
-----   
study  
sleep  
party   
---------

etc

From this, I want Python to create a list that looks like this:

List1 = [['party','sleep','study'],['party','sleep','sleep'],['study','sleep','party']etc]

But it's super hard. I was experimenting with something like:

test2 = open('test2.txt','r')
List=[]

for line in 'test2.txt':
    a = test2.readline()
    a = a.replace("\n","")
    List.append(a)
    print(List)

But this just does horrible horrible things. How to achieve this?

K DawG
  • 13,287
  • 9
  • 35
  • 66
imfromsweden
  • 169
  • 1
  • 2
  • 9
  • 1
    This is a typical case of a temporary counter that you either reset or use modulus on :). –  Dec 22 '13 at 14:48
  • 2
    Just to be clear does the text file actually have lines in it with dashes in e.g. '--------' that you want to act as the delimiter between lists? – ChrisProsser Dec 22 '13 at 14:48
  • Well, I put them there myself, otherwise it's gonna be super hard to know where to end one sequence and start a new one, no? :o But I'm all open to suggestions how to structure the text file in a good way! – imfromsweden Dec 22 '13 at 14:50
  • You would need some sort of logic to determine where to stop a list and start a new one, is it just every three entries or is there more to it than that? – ChrisProsser Dec 22 '13 at 14:52
  • It's always just three entries that go into the so called mini-lists :) – imfromsweden Dec 22 '13 at 14:54
  • That will be nice and easy without adding the lines. I would work on an answer, but it looks like other have beaten me to it. – ChrisProsser Dec 22 '13 at 14:55

3 Answers3

4

If you want to group the data in size of 3. Assumes your data in the text file is not grouped by any separator.

You need to read the file, sequentially and create a list. To group it you can use any of the known grouper algorithms

from itertools import izip, imap
with open("test.txt") as fin:
    data = list(imap(list, izip(*[imap(str.strip, fin)]*3)))

pprint.pprint(data)
[['party', 'sleep', 'study'],
 ['party', 'sleep', 'sleep'],
 ['study', 'sleep', 'party']]

Steps of Execution

  1. Create a Context Manager with the file object.
  2. Strip each line. (Remove newline)
  3. Using zip on the iterator list of size 3, ensures that the items are grouped as tuples of three items
  4. Convert tuples to list
  5. Convert the generator expression to a list.

Considering all are generator expressions, its done on a single iteration.

Instead, if your data is separated and grouped by a delimiter ------ you can use the itertools.groupby solution

from itertools import imap, groupby
class Key(object):
    def __init__(self, sep):
        self.sep = sep
        self.count = 0
    def __call__(self, line):
        if line == self.sep:    self.count += 1
        return self.count


with open("test.txt") as fin:
    data = [[e for e in v if "----------" not in e]
        for k, v in groupby(imap(str.strip, fin), key = Key("----------"))]


pprint.pprint(data)
[['party', 'sleep', 'study'],
 ['party', 'sleep', 'sleep'],
 ['study', 'sleep', 'party']]

Steps of Execution

  1. Create a Key Class, to increase a counter when ever the separator is encountered. The function call spits out the counter every-time its called apart from conditionally increasing it.
  2. Create a Context Manager with the file object.
  3. Strip each line. (Remove newline)
  4. Group the data using itertools.groupby and using your custom key
  5. Remove the separator from the grouped data and create a list of the groups.
Community
  • 1
  • 1
Abhijit
  • 62,056
  • 18
  • 131
  • 204
3

You can try with this:

res = []
tmp = []

for i, line in enumerate(open('file.txt'), 1):
    tmp.append(line.strip())
    if i % 3 == 0:
        res.append(tmp)
        tmp = []

print(res)

I've assumed that you don't have the dashes.

Edit:

Here is an example for when you have dashes:

res = []
tmp = []

for i, line in enumerate(open('file.txt')):
    if i % 4 == 0:
        res.append(tmp)
        tmp = []
        continue
    tmp.append(line.strip())

print(res)
smeso
  • 4,165
  • 18
  • 27
  • Oh thank you that's great! I only understand half of it, but oh well :D I see that you've stripped the text file from "-----" here right? If so, it's gonna be quite hard to keep track on where one sequence starts and the next one finishes. Assuming that you want to be able to edit these sequences of three easily in the text file, it might be quite annoying to not know where on starts and the other one ends :) Is there any way this could be used even with something inbetween the sequences? Thanks a lot btw! – imfromsweden Dec 22 '13 at 14:59
  • I didn't strip the "-----". I've assumed that they are not there: "I can structure it however I want". If the "sub-lists" are all of the same length (i.e. 3) you won't need any separator to know where one starts. – smeso Dec 22 '13 at 15:03
  • Thanks! I've never seen the expression "for i, line in (something). What does it mean? I think I might get the gist of it, for every row that isn't dividable by 3, we just append that row to our list tmp. But every third row we append those three things to our final solution, the list res. We also put the tmp back to nothing and start the process over again. I actually understood more of it as I was trying to explain what I didn't understand haha. It's actually quite elegant. Well played sir. – imfromsweden Dec 22 '13 at 15:33
  • Still though, I do think it's quite good to have some sort of "----" between the sequences of three. The idea is that any kind of person should be able to edit these sequences to whatever they think is appropiate, and I do think it'd look better if it was already clearly divided into groups of three, so you don't have to calculate that by yourself. I have an idea though, maybe we can just divide it by four and do the same thing? Except, if the length of a line is shorter than, say 3, then it's just not added. That was we can divide sequences by "--", without it affecting our list. Thoughts?:D – imfromsweden Dec 22 '13 at 15:41
  • Every file in python is an object which implements the [iterator protocol](http://docs.python.org/2/library/stdtypes.html#iterator-types). So you can iterate over a file object using a `for` statement getting its lines. `enumerate` is a built-int function of python who is able to wrap an iterator (in this case a file) and return a tuple with two elements: a counter and the iterator's element. – smeso Dec 22 '13 at 15:43
0

First big problem:

for line in 'test2.txt':

gives you

't', 'e', 's', 't', '2', '.', 't', 'x', 't'

You need to loop through the file you open:

for line in test2:

Or, better:

with open("test2.txt", 'r') as f:
    for line in f:

Next, you need to do one of two things:

  1. If the line contains "-----", create a new sub-list (myList.append([]))
  2. Otherwise, append the line to the last sub-list in your list (myList[-1].append(line))

Finally, your print at the end should not be so far indented; currently, it prints for every line, rather than just when the processing is complete.

    List.append(a)
print(List)

Perhaps a better structure for your file would be:

party,sleep,study
party,sleep,sleep
...

Now each line is a sub-list:

for line in f:
    myList.append(line.split(','))
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • Great answer, thanks a lot! I actually even understand most of this post, haha :D Your suggestion to keep the structure as party,sleep,study sleep,study,sleep was really good cheers. Only problem is that I get a "\n" in the list since I change the row (Which I do think is neccesary since you should be able to tell easily where one sequence of three starts and where it ends!), and I think that might cause some problems. I tried to use a replace("\n","") function, but Python wasn't having any of it. Not sure why tbh, it has its periods it seems :) – imfromsweden Dec 22 '13 at 15:10
  • You can remove blank space and `'\n'` characters from the start and end of the line using `line = line.strip()` – jonrsharpe Dec 22 '13 at 17:13