Extract specific lines from text files using Python

Question

Hello please see link below of previous solved post.

Copy specific lines from text files into excel

This VBA code allowed me to open all the text files in a folders and load specific lines into an excel spreadsheet.

However, I am now working with bigger files and more of them. Using VBA is no longer an option has it takes too long and how a data limit.

I'm wondering if I there is any existing python code that can extract and specific lines from each data file and either save as a new document or save itself as its current file name.

Sample code:

DATASET UNSTRUCTURED_GRID
POINTS 5 float
0.096853 0.000000 0.111997
0.096853 -0.003500 0.111997
0.096890 0.000000 0.084015
0.096853 -0.003500 0.111997
0.096890 -0.003500 0.084015
CELL_DATA 5
SCALARS pressure float 1
LOOKUP_TABLE default
-0.000000
-0.000000
-3.000000
-2.000000
-6.000000

Any tips on this would be higher appreciated. Thanks, Jon

A small sample of data could be very instrumental for solution. — Evgeny, Mar 27 '18 at 10:43
hello Evgeny, I've added a samle sample to the question. Thanks — Jon, Mar 27 '18 at 10:48
Sorry, the data I'm trying to extract is the pressure values, so the last 5 lines in this case. — Jon, Mar 27 '18 at 10:53

Lior T · Answer 1 · 2018-03-27T12:53:01.530

0

try this:

counter=0
with open(filename, 'w') as infile:
    for line in infile:
            counter+=1
            if line.split(' ')[0]=='CELL_DATA':
            i=counter+3
            j=line.split(' ')[1]+1
            break

this part finds the line you should start retrive data from. then you can do whatever you like with the data. for exmaple:

data=[]
for line in infile[i:i+j]:
    data.append(line)

and to save the data somewhere just use python option to write to excel or any other file. Good luck!

edited Mar 27 '18 at 12:53

answered Mar 27 '18 at 10:41

Lior T

137
2
9

Hello Lior, thanks for your comment. I've since added some sample data to the question. It's the pressure values that I'm trying to select from each file. My issue is that I have around 10,000 files to do this for so trying to find best(quickest) approach. – Jon Mar 27 '18 at 10:56
1

will the data your interested in will always follow the same string? (i.e 'LOOKUP_TABLE default'?) is this string is unique or does it appears more than once in each file? – Lior T Mar 27 '18 at 11:05
'LOOKUP_TABLE default' can appear more that once. The string "SCALARS pressure float 1" above the 'LOOKUP_TABLE default' string is what is unique above the pressure values. I should also say that the pressure values are not always at the very end of the text file. Thanks – Jon Mar 27 '18 at 11:35
The number beside the string "CELL_DATA" specifies the number of pressure values present. – Jon Mar 27 '18 at 11:56

score 0 · Answer 2 · answered Mar 27 '18 at 11:11

May try this:

# use python 3.6
from pathlib import Path 

txt_file_content = """DATASET UNSTRUCTURED_GRID
POINTS 5 float
0.096853 0.000000 0.111997
0.096853 -0.003500 0.111997
0.096890 0.000000 0.084015
0.096853 -0.003500 0.111997
0.096890 -0.003500 0.084015
CELL_DATA 5
SCALARS pressure float 1
LOOKUP_TABLE default
-0.000000
-0.000000
-3.000000
-2.000000
-6.000000"""

# creating sample file
Path('sample.txt').write_text(txt_file_content)

Code above creates a sample file, then parse it:

# read a file back, itrate over many files if needed
doc = Path('sample.txt').read_text()

# NOTE:
# you can walk over *.txt files in specific fodler with 
# https://docs.python.org/3/library/glob.html#glob.glob


# assume the disired text block is 
#   (1) always after 'LOOKUP_TABLE default'
#   (2) at the end of txt file
last_text_segment = doc.split('LOOKUP_TABLE default')[1]

values = [float(x) for x in last_text_segment.split('\n') if x]

# alternatively as a function:

def extract_pressure(filename):
    doc = Path(filename).read_text()
    last_text_segment = doc.split('LOOKUP_TABLE default')[1]
    return [float(x) for x in last_text_segment.split('\n') if x]

You may want to assemble data to a pandas dataframe for further numeric operations with it.

Thanks Evgeny for this code, this will go a long way to get me started. I will just need to tweak it as the pressure values are not always at the end of the txt file. I will also now look at 'glob' to try and do this for all files in the folder. Many thanks! — Jon, Mar 27 '18 at 12:15
You can think of a preudocode/algorithm on how you extract the values, based on the data that you have. A more general way is to slit you text file in blocks with header and body, and then evaluate a desired block contains a marker text in header. I use this approach in here: https://github.com/mini-kep/parser-rosstat-kep/blob/dev/src/kep/csv2df/parser.py#L75-L95, but code is a bit more sophisticated, you probably need something simplier. Feel free to mark as accepted answer, if useful. — Evgeny, Mar 27 '18 at 12:27
`glob` case is also click away in suggested links: https://stackoverflow.com/questions/3964681/find-all-files-in-a-directory-with-extension-txt-in-python?rq=1 — Evgeny, Mar 27 '18 at 12:28

Extract specific lines from text files using Python

2 Answers2