I have a huge data file: I need to extract lines starting with say U(1.0 ----) irrespective of the line number because the line number varies with each run.
I tried splitting and reading but the output is not handleable. Can anyone help me?
Asked
Active
Viewed 80 times
1

Luca Davanzo
- 21,000
- 15
- 120
- 146

Bhagya Msubrayan
- 11
- 1
-
im new to python could u plzz explain a bit more – Bhagya Msubrayan Jul 29 '14 at 11:00
-
oh, in python! not bash? – Luca Davanzo Jul 29 '14 at 11:00
-
https://developers.google.com/edu/python/regular-expressions?hl=fr – Luca Davanzo Jul 29 '14 at 11:03
-
yes hw to do the same in python – Bhagya Msubrayan Jul 29 '14 at 11:04
-
11) Show us a sample of the input. 2) Tell us which lines of the input you want to extract and why. 3) Show us what you have done in Python to solve the problem. – Jul 29 '14 at 11:06
-
possible duplicate of [Grep and Python](http://stackoverflow.com/questions/1921894/grep-and-python) – Luca Davanzo Jul 29 '14 at 11:11
-
Please don't post hw here! – MarmiK Jul 29 '14 at 12:14
3 Answers
0
- you have to read a file (https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files)
- Then make a loop though lines and get the first part of the line.
- Then you need to check if match with a regular expression you design for that task.
Hope it helps you :)

recluising
- 348
- 1
- 3
- 13
0
Use the startswith() string method on each line and add them to a seperate list for analysis
data = open("whatever").readlines()
results = []
for line in data:
if line.startswith("U(1.0"):
results.append(line)

manicphase
- 618
- 6
- 9
0
Similar to manicphase's answer, use Python's startswith
string method to pick out the lines you are interested in.
with open('mydata.txt') as data:
for line in data:
if line.startswith('U(1.0 '):
# Do stuff here
A little simpler than manicphase's solution and quicker, as you don't need to re-iterate over the list which, if you have a lot of data, might have an adverse effect.
I don't have enough reputation to comment on manicphase's answer, so I shall make a note here instead: The space delimiter after the 1.0
is important if the data can have more than one decimal point (question doesn't specify), otherwise it might match U(1.0234 xxxx)
as well.

Mike Wild
- 171
- 7