How can I read a text file and only print specific lines where the value in one column is above a threshold?

Question

I have to read a comma-separated text file that looks like this:

ID, x, y, soil_temp
1, 10, 6, 8
2, 21, 11, 12
3, 11, 7, 7
4, 32, 12, 8
5, 9, 29, 6
6, 17, 16, 9
7, 22, 9, 11
8, 14, 31, 7
9, 26, 21, 6
10, 19, 19, 10

And thereafter I have to print the columns ID and soil_temp and all the lines with soil temperature above 10. So the result should look something like this:

ID, soil_temp
2, 12
7, 11

Important! No modules should be needed such as pandas and cv, which makes the exercise frustrating for me. It is probably quite easy for most people here.

So I have made a code that looks like this to be able to print the column with soil_temp:

tempLine = []
with open('soil_temp.txt', 'r') as f:
    read_data = f.readlines()

for line in read_data:
    line.split()
    tempLine.append(line.split())

for item in tempLine:
    print(item[3])

This code is also based on advices given in the exercise. My problem is that if I want to only print the lines above 10, I would think a simple if statement in the last part of the code, something like this would make sense:

for item in tempLine:
    if item[3] > 10
        print(item[3])

But of course this does not work since the data is stored as strings. I have tried different solutions to change them into integers but since it is multiple strings, I can't find a solution.

Tips: `line.split(",")` # split on commas, then perhaps s.strip() to remove whitespace, then int(s) to cast a string to an integer. — , Jan 26 '21 at 19:24
Does this naswer your question? [How do I read and write CSV files with Python?](https://stackoverflow.com/questions/41585078/how-do-i-read-and-write-csv-files-with-python) — Tomerikoo, Jan 26 '21 at 19:26
Hi there, pandas DataFrame might give a more concise solution: ```import pandas as pd; pd.read_csv('soil_temp.txt'); df[df[' soil_temp']>10][['ID', ' soil_temp']] ``` — kabhel, Jan 26 '21 at 19:36

score 0 · Answer 1 · answered Jan 26 '21 at 19:30

As others have pointed out, you have to split on comma. I have reworked your code a bit, and used a list comprehension to make tempLine be a list of lists containg the row-data.

with open('soil_temp.txt', 'r') as f:
    f.readline() # get rid of header line
    read_data = f.readlines()

tempLine = []
for line in read_data:
    tempLine.append([int(x) for x in line.split(',')]) # make a list of ints out of each dataline

print(tempLine) # to show the structure

for item in tempLine:
    if item[3] > 10:
        print(item[3])

The output from running it will is as follows:

[[1, 10, 6, 8], [2, 21, 11, 12], [3, 11, 7, 7], [4, 32, 12, 8], [5, 9, 29, 6], [6, 17, 16, 9], [7, 22, 9, 11], [8, 14, 31, 7], [9, 26, 21, 6], [10, 19, 19, 10]]
12
11

This is not idiomatic Python; I wanted to show how to change your existing code to get over your bump.

score 0 · Accepted Answer · answered Jan 26 '21 at 19:32

I might do it like this:

with open('soil_temp.txt', 'r') as f:
    read_data = f.readlines()[1:]

id_and_temp = [(i, t)
    for i, _, _, t in (line.strip().split(", ") for line in read_data)
    if int(t) > 10
]

print("ID, soil_temp")
for i, t in id_and_temp:
    print(f"{i}, {t}")

How can I read a text file and only print specific lines where the value in one column is above a threshold?

2 Answers2