Find a specific value/line in a txt file in Python

Question

My txt file looks like this:

AL South 4863000

AK West 742000

AZ West 6931000

AR South 2988000

CA West 39250000

CO West 5541000

CT Northeast 3576000

DE South 952000

FL South 20612000

GA South 10310000

HI West 1429000

ID West 1683000

IL Midwest 12802000

IN Midwest 6633000

IA Midwest 3135000

KS Midwest 2907000

... and so on

I need to find and print the state with the highest population in the Midwest region.

Example output:

Highest population state in the Midwest is: IL 12802000

This is what I have so far:

f = open('States.txt','r')
columns = list(zip(*(map(str, row.split()) for row in f)))

t = columns[2]
result = tuple(int(x[0:10]) for x in t)
print(max(result))

I'm not sure how to filter with "Midwest", I was only able to find the largest integer value in column 3.

And what is the question exactly? What prevents you to do that? Where are you stuck? — Pac0, Sep 17 '21 at 06:22
Don't try to do this in base Python, use pandas. This would be a 3-liner in pandas. — smci, Sep 17 '21 at 06:51

score 0 · Answer 1 · answered Sep 17 '21 at 06:28

0

You can read the text file by using NumPy.genfromtxt() or Pandas. Then search the true indices if the element of the second column is "Midwest". Then sort the indices by using the number elements in third columns... then you can get "IL" ...

answered Sep 17 '21 at 06:28

Jiwoong Park

1

Hi, your answer should be a comment on the original post. If you want to keep answering, provide a code exemple or solution please ;) – Maelig Sep 17 '21 at 07:16
@Maelig Even though this answer could be improved by some code example, it seems to me it is still a proper attempt to answer the question. – Pac0 Sep 17 '21 at 18:26

score 0 · Answer 2 · answered Sep 17 '21 at 07:22

How to debug

Add a simple print for each intermediate result.

f = open('States.txt','r')
columns = list(zip(*(map(str, row.split()) for row in f)))
print(columns)

t = columns[2]
print(t)

result = tuple(int(x[0:10]) for x in t)
print(result)

print(max(result))

The output looks like:

[('AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS'), ('South', 'West', 'West', 'South', 'West', 'West', 'Northeast', 'South', 'South', 'South', 'West', 'West', 'Midwest', 'Midwest', 'Midwest', 'Midwest'), ('4863000', '742000', '6931000', '2988000', '39250000', '5541000', '3576000', '952000', '20612000', '10310000', '1429000', '1683000', '12802000', '6633000', '3135000', '2907000')]

('4863000', '742000', '6931000', '2988000', '39250000', '5541000', '3576000', '952000', '20612000', '10310000', '1429000', '1683000', '12802000', '6633000', '3135000', '2907000')

(4863000, 742000, 6931000, 2988000, 39250000, 5541000, 3576000, 952000, 20612000, 10310000, 1429000, 1683000, 12802000, 6633000, 3135000, 2907000)

39250000

Find the statement to modify

Since you need the states to filter for IL, the first statement needs a filter added:

columns = list(zip(*(map(str, row.split()) for row in f)))

So we can decompose it. Then try to add a filter after each row is split.

Decompose the nested statement

Unwrap the parentheses step by step. This is also known as refactoring of type Extract Variable:

mapped = (map(str, row.split()) for row in f)  # generator mapping needs to be decomposed
zipped = zip(*mapped) # unpacked as tuples

columns = list(zipped)

map()
for the unpacking asterisk * see also unpacking function argument and Python docs Call argument-list
zip()
list()

Now lets further decompose the complex generator-mapping:

mapped = (map(str, row.split()) for row in f)

# could call it a file-reading spliterator

score 0 · Answer 3 · answered Sep 17 '21 at 08:28

You should keep them as rows instead of columns, it's easier to extract related information on the same row. You can use a variable to save the row with highest population when you reading through the text file.

def find_max(region):
    # always suggest to use with open as it will close the file for you
    with open("States.txt") as f:
        result = []
        for row in f:
            # only update the result when the row is related to that region
            if region in row:
                row = row.split()
                # update result when we have no result yet
                if not result:
                    result = row
                # update result if population from current row is greater than our result
                elif int(row[2]) > int(result[2]):
                    result = row
    if result:
        print(result[0], result[2])

find_max("Midwest")
find_max("West")

Output:

IL 12802000
CA 39250000

Find a specific value/line in a txt file in Python

3 Answers3

How to debug

Find the statement to modify

Decompose the nested statement