0

I have a file which looks like the following:

@ junk
...
@ junk
    1.0  -100.102487081243
    1.1  -100.102497023421
    ...   ...
    3.0  -100.102473082342
&
@ junk
...

I am interested only in the two columns of numbers given between the @ and & characters. These characters may appear anywhere else in the file but never inside the number block.

I want to create two lists, one with the first column and one with the second column.

List1 = [1.0, 1.1,..., 3.0]
List2 = [-100.102487081243, -100.102497023421,..., -100.102473082342]

I've been using shell scripting to prep these files for a simpler Python script which makes lists, however, I'm trying to migrate these processes over to Python for a more consistent application. Any ideas? I have limited experience with Python and file handling.

Edit: I should mention, this number block appears in two places in the file. Both number blocks are identical.

Edit2: A general function would be most satisfactory for this as I will put it into a custom library.

Current Efforts

I currently use a shell script to trim out everything but the number block into two separate columns. From there it is trivial for me to use the following function

def ReadLL(infile):
    List = open(infile).read().splitlines()
    intL = [int(i) for i in List]
    return intL

by calling it from my main

import sys
import eLIBc
infile = sys.argv[1]
sList = eLIBc.ReadLL(infile)

The problem is knowing how to extract the number block from the original file with Python rather than using shell scripting.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
LordStryker
  • 127
  • 7
  • 2
    It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. – Martijn Pieters Jan 24 '13 at 17:30
  • @MartijnPieters I am indeed looking for some help. I've included some more code for you. – LordStryker Jan 24 '13 at 17:37
  • Do you have numbers anywhere else in the file? – ATOzTOA Jan 24 '13 at 17:39
  • @ATOzTOA Numbers, words, characters, etc. can and does appear anywhere else in the file. However, it seems as though any line not within this number block begins with an '&' or '@'. – LordStryker Jan 24 '13 at 17:40
  • 1
    There is a `@ junk` line at the top too; how do you recognize the number block? Does a `@` on the line always signal that the next lines will be a number block? – Martijn Pieters Jan 24 '13 at 17:55
  • @MartijnPieters Almost every line begins with an `@` followed by some string of characters/numbers. The only time `@` does not appear is in the number blocks(s), or immediately following the number block in which case there is an `&` all by itself as shown in the MWE. Furthermore, there are many `@ junk` lines before and after each number block. – LordStryker Jan 24 '13 at 18:10
  • @LordStryker: Ah, so as soon as you have a line *without* the `@` at symbol, you have a number block. I've updated my answer. That kind of detail is *crucial* to understanding how to solve your problem, btw. – Martijn Pieters Jan 24 '13 at 18:15
  • Please don't add 'solved' to a question; that's what the checkmark is for. You can leave comments on the other answers if you like though. :-) – Martijn Pieters Jan 24 '13 at 18:28
  • See [What should I keep out of my posts and titles?](http://meta.stackexchange.com/q/131009) – Martijn Pieters Jan 24 '13 at 18:29
  • @MartijnPieters Okay I fail to see how putting 'solved' and 'thanks' INTO the post goes against the stated guidelines. – LordStryker Jan 24 '13 at 18:32
  • @LordStryker: The point of SO is that the questions and answers are useful to a wider public. Not just your problem is solved, hopefully others with the same problem will find your question, and make their own decision as to what helps them. By voting on the question, answers, they can show that agreement. But it helps to keep the question clear of clutter. Use comments for that kind of information instead. – Martijn Pieters Jan 24 '13 at 18:35
  • SOLVED - Thanks to everyone who contributed. I've used @MartijnPieters' example as the best solution especially for making this in `def` form. – LordStryker Jan 24 '13 at 18:37
  • @MartijnPieters I suggest someone update your 'Posts & Titles' page. – LordStryker Jan 24 '13 at 18:38
  • @LordStryker: You can help with that yourself. Meta is a Q&A site just like this one; you can suggest edits there too! – Martijn Pieters Jan 24 '13 at 18:40
  • @LordStryker: Also, the post I linked to already covers your edit: the comments section in the first answer (full title 'Comments, Comments, Comments'). :-) – Martijn Pieters Jan 24 '13 at 18:45
  • @MartijnPieters It is still ambiguous (evident of our recent commentation). I understand ex post facto. By the way your code was just what I needed and now I've automated this all the way to graphing. You got the snowball rolling down a very steep hill. – LordStryker Jan 24 '13 at 18:53
  • @LordStryker: Glad to have been of help in any case! :-) – Martijn Pieters Jan 24 '13 at 18:54

2 Answers2

1

You want to loop over the file itself, and set a flag for when you find the first line without a @ character, after which you can start collecting numbers. Break off reading when you find the & character on a line.

def readll(infile):    
    with open(infile) as data:
        floatlist1, floatlist2 = [], []
        reading = False

        for line in data:
            if not reading:
                if '@' not in line:
                    reading = True
                else:
                    continue

            if '&' in line:
                return floatlist1, floatlist2

            numbers = map(float, line.split())
            floatlist1.append(numbers[0])
            floatlist2.append(numbers[1])

So the above:

  • sets 'reading' to False, and only when a line without '@' is found, is that set to True.
  • when 'reading' is True:
    • returns the data read if the line contains &
    • otherwise it's assumed the line contains two float values separated by whitespace, which are added to their respective lists

By returning, the function ends, with the file closed automatically. Only the first block is read, the rest of the file is simply ignored.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

Try this out:

with open("i.txt") as fp:
    lines = fp.readlines()
    data = False
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])
            data = True
        elif data == True:
            break

print List1
print List2

This should give you the first block of numbers.

Input:

@ junk
@ junk
1.0  -100.102487081243
1.1  -100.102497023421
3.0  -100.102473082342
&
@ junk
1.0  -100.102487081243
1.1  -100.102497023421

Output:

['1.0', '1.1', '3.0']
['-100.102487081243', '-100.102497023421', '-100.102473082342']

Update

If you need both blocks, then use this:

with open("i.txt") as fp:
    lines = fp.readlines()
    List1 = []
    List2 = []
    for line in lines:
        if line[0] not in ['&', '@']:
            print line
            line = line.split()
            List1.append(line[0])
            List2.append(line[1])

print List1
print List2
ATOzTOA
  • 34,814
  • 22
  • 96
  • 117
  • @MartijnPieters OP said `@` lines are not needed. – ATOzTOA Jan 24 '13 at 17:58
  • That's not what I meant. He needs to read the number block, not the whole file minus the `@` or `&` lines. There are two blocks (identical) with numbers, the rest of the file can be ignored. It is not yet clear how the number blocks are to be recognized; for now my answer assumes a number block starts after the first `@` line. – Martijn Pieters Jan 24 '13 at 18:00
  • OP said the number blocks are identical, so he only needs to read one block, right? – ATOzTOA Jan 24 '13 at 18:27