1

I am trying to classify a data set with 21 columns and a lot of rows. I've gotten to the point where I can import the data as a csv and print out seperate columns. There are two things I have left to do. First I want to be able to print out specific data points. For example the data point that is located in row 2 column 4. The second task is to classify the rows of data based off of columns 4 and 5. These columns are latitude and longitude. and I am trying to get rows that are in a specific part of the world. so my idea to do this was this

if  60 > row[4] > 45 and 165 > row[1] > 150:

ie( so like the math operation (9 > x > 5))

I'm not sure what the proper way to do the above procedure is.

I have pasted the code to the bottom. I am new to programming in python so feel free to point out errors.

import csv
path = r'C:\Documents and Settings\eag29278\My Documents\python test code\test_satdata.csv'
with open(path, 'rb') as f:
    reader = csv.reader(f, delimiter=',')
    for row in reader:
        print row [0]
        #this prints out the first column 

    var1 = []

    for row in f:

       if  60 > row[4] > 45 and 165 > row[1] > 150:

          var1.append(row)

print var1

UPDATE 1

okay so i updated the code but when i run the module i get this output..

2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 []

so I see that the program prints out var1 but it is empty

lejlot
  • 64,777
  • 8
  • 131
  • 164
erik.garcia294
  • 103
  • 1
  • 1
  • 5
  • `9 > x > 5` works fine in python. – Ashwini Chaudhary Aug 05 '13 at 20:21
  • It looks like your code should do exactly what it sounds like you want it to do. So… do you have a question here? If you just want working code reviewed, use [Code Review](http://codereview.stackexchange.com), not Stack Overflow. – abarnert Aug 05 '13 at 20:35
  • I'm guessing that code review is a site specifically for troubleshooting then? – erik.garcia294 Aug 05 '13 at 20:40
  • 1
    `for row in f` will yield a `str` object for each line, so that's why none of the rows are being appended. `60 > 'foo' > 45` won't raise an exception in Python -- it will just return False. – Chris Barker Aug 05 '13 at 20:51
  • @erik.garcia294 No, CodeReview is for when you have working code, but you think it could be written more cleanly or made to run faster. – SethMMorton Aug 05 '13 at 21:30
  • @ChrisBarker: You have hit the nail on the head. You should post that as an answer. However, it is worth noting that Python 3 does away with comparisons between strings and ints (well, you get TypeError if you try to compare them). – John Y Aug 05 '13 at 22:15

6 Answers6

5

From the docs:

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • This answer, while certainly presenting correct information, does nothing to address the OP's real problem(s). – John Y Aug 06 '13 at 07:20
1

This line:

if  60 > row[4] > 45 and 165 > row[1] > 150:

is comparing 60 > [the fifth character in the row, as a string] > 45 .... I'm guessing that's not what you want. for row in f yields a string for each line in the file. I think you meant to do some parsing before you made these comparisons. Or maybe you wanted to iterate over reader instead of f. f is the file itself, not the csv reader.

This should work better:

with open(path, 'rb') as f:
    reader = csv.reader(f, delimiter=',')
    var1 = [] # This is a very poorly named variable, by the way.
    for row in reader:
        print row [0]
        if  60 > row[4] > 45 and 165 > row[1] > 150:
            var1.append(row)
Chris Barker
  • 2,279
  • 14
  • 15
  • I know now that I should not be using f, is there an alternative that yields something that will give me an output... someone recomended using this line for row in reader, but i still get the same output. what do you mean by more parsing? – erik.garcia294 Aug 05 '13 at 22:25
  • Parsing means coercing a string to an int, in this situation. You might need to do something like `int(row[4])` if you're iterating over the csv reader, or `int(line.split(',')[4])` if you're iterating over f. – Chris Barker Aug 05 '13 at 22:28
  • Hmm. Well, you pointed out several problems: (1) CSV elements are always strings, (2) strings don't necessarily compare in a useful or intuitive way with integers, and (3) OP seems to be confused about the difference between a file and a CSV reader over that file. But then your code snippet doesn't fix (1) or (2). – John Y Aug 06 '13 at 06:35
1

All the answers about "chained comparison" (e.g. 60 > foo > 45) completely miss the point. You're not having a problem with chained comparison. But you've got lots of issues in your code.

First, the rows that are returned by a CSV reader always have strings as elements. So if the CSV looks like

10,20,abc,40

what it becomes in Python when you use a CSV reader is

['10', '20', 'abc', '40']  # list of strings

In Python 2, comparing strings with numbers "works" in the sense that you can do it, and it doesn't raise any exceptions. But it's not usually what you want. For example:

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> 1 < '2'
True
>>> 2 < '1'
True

Note that Python 3 won't even let you compare strings with numbers:

Python 3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> 1 < '2'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
>>>

So, one thing you need to do is convert the strings in the CSV to integers:

>>> 1 < '2' < 3  # Python 2
False
>>> 1 < int('2') < 3
True

Another thing you need to do is make sure you are reading CSV rows, rather than plain old lines in the file. Where you have

var1 = []
for row in f:
   if  60 > row[4] > 45 and 165 > row[1] > 150:
      var1.append(row)

What you are doing is comparing the 5th character of each line with 60 and 45, and the 2nd character of each line with 165 and 150. You almost certainly meant

var1 = []
for row in reader:
    if 60 > int(row[4]) > 45 and 165 > int(row[1]) > 150:
        var1.append(row)

But unfortunately, that's still not all. You already "used up" all the rows in the CSV when you did

for row in reader:
    print row [0]

At the end of that loop, reader has no more rows to read. The most straightforward thing to do is to reopen the file and use a new reader for each loop:

with open(path, 'rb') as f:
    reader = csv.reader(f, delimiter=',')  # why specify the delimiter?
    for row in reader:
        print row[0]
        #this prints out the first column 

with open(path, 'rb') as f:  # we open the file a second time
    reader = csv.reader(f)
    var1 = []
    for row in f:
        if 60 > int(row[4]) > 45 and 165 > int(row[1]) > 150:
            var1.append(row)

For beginners, and even most experienced Python programmers, this is sufficient. The code is clear to the point of obviousness, which is usually a Good Thing. If special circumstances dictate fancier measures, look at these past questions for possible alternatives:

Can iterators be reset in Python?

Proper way to reset csv.reader for multiple iterations?

Community
  • 1
  • 1
John Y
  • 14,123
  • 2
  • 48
  • 72
  • First off thanks for taking the time to help me out. I did excalty what you recomended but I recieved this error about half the time I executed the code Traceback (most recent call last): File "C:\Python27\test code\12345.py", line 9, in if 60 > int(row[3]) > 45 and 165 > int(row[4]) > 150: ValueError: invalid literal for int() with base 10: '8.67' 1 the rest of the time I recieved an empty variable when I printed – erik.garcia294 Aug 06 '13 at 15:11
  • Well, hopefully you understand that `int` is for converting to integer. You should probably use `float` instead. When you read people's answers, you really should try to understand what they are saying, not just blindly type their code. If you don't understand, try looking things up in the documentation. Or use Google. Not only will you learn more by putting in the effort, other people are more likely to be willing to help you. People generally don't want to do all your work for you. People definitely don't want to do all your *thinking* for you! – John Y Aug 06 '13 at 17:52
0

That'll actually work just fine in Python. Most other languages wouldn't let you do it; you'd have to write 60 > row[4] and row[4] > 45 and ....

user2357112
  • 260,549
  • 28
  • 431
  • 505
0

You'd normally write it using < to make it look more like a BETWEEN operation...

if  (45 < row[4] < 65) and (150 < row[1] < 165):
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
0

Chaining works with other operators too, eg ==, is, in. There is an implicit and.

You could use a list comprehension here

var1 = [row for row in f if 45 < row[4] < 60 and 150 < row[1] < 165]

I agree with @Jon. Using < reads more naturally to me than >

John La Rooy
  • 295,403
  • 53
  • 369
  • 502