0

I have one text file as follows with 2 columns

44333373    -5.829738285
3007762     -5.468521083
16756295    -5.247183569
46197456    -5.216096421
46884567    -5.195179321
44333390    -5.162411562
44420579    -5.133122186
6439190     -5.028260409
...

I want to extract values which greater than -5.162411562 ideal output should look like

Output

44333373    -5.829738285
3007762     -5.468521083
16756295    -5.247183569
46197456    -5.216096421
46884567    -5.195179321

To accomplish this task i wrote simple python script

f1=open("file.txt","r")
n=0
for line in f1.readlines():
     if float(n) > -5.162411562:
        print line

But it is just reading all data in file. I know it is a very simple task but I am not able to figure out where I am going wrong. Can anybody help?

phihag
  • 278,196
  • 72
  • 453
  • 469
nit
  • 689
  • 2
  • 9
  • 20
  • One more thing: your example output doesn't show lines with values greater than -5.162411562, it shows lines with values *less than* -5.162411562. They're negative. You will need to write `<` instead of `>` in the code in either of the answers below. – Tom Anderson Oct 15 '11 at 10:33

3 Answers3

2

Well, you need to actually set n to a value aside from zero. How about:

with open('file.txt') as f1:
  for line in f1: # readlines is not necessary here
    n = float(line.split()[1]) # line.split()[0] is the first number
    if n > -5.162411562:
        print (line.rstrip('\r\n')) # rstrip to remove the existing EOL in line
phihag
  • 278,196
  • 72
  • 453
  • 469
0

line contains 44333373 -5.829738285. when looping through lines you need to split the line & consider the first element & you dont need n. Then compare. So the code changes to -

f1=open("file.txt","r")
for line in f1.readlines():
     if float(line.split()[1]) > -5.162411562:
        print line

Slight modification here. readlines reads the entire file contents into memory in one single go. If the file is too big then you could have problems. The file operator in python is a iterator. how cool is that! Also open by default opens a file in read mode. So the code further simplifies to -

for line in open('file.txt'):
    if float(line.split()[1]) > -5.162411562:
        print line

Hope this helps...

Srikar Appalaraju
  • 71,928
  • 54
  • 216
  • 264
  • Note that one should also close the file, and probably remove `n=0`. Also, the call to `readlines` makes this program unnecessarily allocate memory of the file's size, which can be significant. – phihag Oct 15 '11 at 10:25
  • The updated version actually makes it impossible to ever close the file; you may be leaking the descriptor (although some Python implementations free it once the file is no longer reachable). – phihag Oct 15 '11 at 10:31
  • that's what mostly happens. python will take care of closing once the file is no longer used or reachable. I believe it closes the file descriptors once a function exit happens. – Srikar Appalaraju Oct 15 '11 at 10:33
  • **cpython** indeed exhibits this behavior. However, many other implementations [don't](http://stackoverflow.com/questions/2404430/does-filehandle-get-closed-automatically-in-python-after-it-goes-out-of-scope/2404671#2404671). – phihag Oct 15 '11 at 10:43
0

The issue with the code you have presented is that the value of n is never changes, so the if statement will always evaluate to True, and therefore the line will be printed:

f1=open("file.txt","r")
n=0  # the `n` is set here
for line in f1.readlines():
     if float(n) > -5.162411562:  # `n` is always `0`, so this is always `True`
        print line

You'll want to update the variable n with the number extracted from the second column of each line.

Furthermore, the if condition will have to have its comparison operator changed from > (greater than) to < (less than), as the values you show in your output are values which are "less than -5.162411562", not "greater than"

Also, it should be noted that the n=0 is not necessarily required.

With those changes, we get the following code:

f1 = open("file.txt","r")
for line in f1.readlines():
  n = line.split()[1]          # get the second column
  if float(n) < -5.162411562:  # changed the direction comparison
     print line.rstrip()       # remove the newline from the line read
                               # from the file to prevent doubling of newlines
                               # from the print statement
f1.close()                     # for completeness, close the file

The resulting output is:

44333373        -5.829738285
3007762         -5.468521083
16756295        -5.247183569
46197456        -5.216096421
46884567        -5.195179321
coobird
  • 159,216
  • 35
  • 211
  • 226