How would I sort an outputted text file by the nth column - Python

Question

My code:

infile = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

for line in infile:
    data = line.strip().split(',')
    wins = int(data[2])
    percentage = 162 / wins
    p = str(data[0]) + ", " + data[1] + ", " + data[2] + ", " + 
str(round(percentage, 3)) + "\n"
    outfile.write(p)
infile.close()
outfile.close()

The original infile("ALE.txt") is just the first three columns below. The text file that is output from the code above looks like this:

Baltimore, 93, 69, 2.348
Boston, 69, 93, 1.742
New York, 95, 67, 2.418
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82

I know the code correctly calculates the win percentage (column 2/total wins), but I would like to sort this list by the 4th column (win percentage).

You might find this a lot simpler if you use pandas or at least, the `csv` module. Also `eval`? — pvg, Oct 23 '17 at 02:20
@pvg changed eval to int. Didn't think it mattered - got same result — , Oct 23 '17 at 02:24
Create a tuple and sort it based on the forth column. https://stackoverflow.com/questions/12087905/pythonic-way-to-sorting-list-of-namedtuples-by-field-name — , Oct 23 '17 at 02:25
@S.R. I've tried adding the line sorted(p, key=lambda x: x.percentage) but to no avail — , Oct 23 '17 at 02:30

Keerthana Prabhakaran · Accepted Answer · 2017-10-23T05:05:12.380

Append your data to a list, say d.

Sort it with the third item(4th column) of the list. Reference - operator.itemgetter

Write the sorted data to your output file.

Contents of input file

[kiran@localhost ~]$ cat infile.txt
Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

Code::

>>> from operator import itemgetter
>>> d=[]
>>> with open('infile.txt','r') as infile:
...     for line in infile.readlines():
...             data = line.strip().split(',')
...             wins = int(data[2])
...             percentage = 162 / float(wins)
...             data.append(str(round(percentage, 3))) #add percentage to your list that already contains the name and two scores.
...             d.append(data) # add the line to a list `d`
...
>>> print d
[['Baltimore', ' 93', ' 69', '2.348'], ['Boston', ' 69', ' 93', '1.742'], ['New York', ' 95', ' 67', '2.418'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Toronto', ' 73', ' 89', '1.82']]
>>> d.sort(key=itemgetter(3)) #sort the list `d` with the third item(4th column) of your sublist.
>>> print d
[['Boston', ' 69', ' 93', '1.742'], ['Toronto', ' 73', ' 89', '1.82'], ['Tampa Bay', ' 90', ' 72', '2.25'], ['Baltimore', ' 93', ' 69', '2.348'], ['New York', ' 95', ' 67', '2.418']]
>>> #write the items in list d to your output file
>>>
>>> with open('outfile.txt','w') as outfile:
...     for line in d:
...             outfile.write(','.join(line)+'\n')
...
>>>

Content of output file:

[kiran@localhost ~]$ cat outfile.txt
Boston, 69, 93,1.742
Toronto, 73, 89,1.82
Tampa Bay, 90, 72,2.25
Baltimore, 93, 69,2.348
New York, 95, 67,2.418

This is the best answer. Thank you. I had to change your line where you sorted the list because I was getting "itemgetter is not defined" to "d.sort(key=lambda x: x[3], reverse=True)", but it works and it's not a little easier to understand. — , Oct 23 '17 at 04:22
Glad to have helped. Sorry, I forgot to add the import statement. `from operator import itemgetter` should get you through the error. — Keerthana Prabhakaran, Oct 23 '17 at 05:06
I've added comments for your reference. Let me know if you need further explanation. — Keerthana Prabhakaran, Oct 23 '17 at 05:08

Erick Shepherd · Answer 2 · 2017-10-23T03:35:58.280

Try this:

infile  = open("ALE.txt", "r")
outfile = open("ALE_sorted.txt", "w")

master_data = []

# Load in data from the infile and calculate the win percentage.
for line in infile:

    data = line.strip().split(', ')

    wins = int(data[2])
    percentage = 162 / wins
    data.append(str(round(percentage, 3)))

    master_data.append(data)

# Sort by the last column in reverse order by value and store the 
# sorted values and original indices in a list of tuples.
sorted_column = sorted([(float(data[-1]), index) for index, data in \
                        enumerate(master_data)], reverse = True)

# Reassign master_data according to the sorted positions.
master_data   = [master_data[data[1]] for data in sorted_column]

# Write each line to the outfile.
for data in master_data:

    outfile.write(str(", ".join(data) + "\n"))

infile.close()
outfile.close()

Where the contents of infile are the following:

Baltimore, 93, 69
Boston, 69, 93
New York, 95, 67
Tampa Bay, 90, 72
Toronto, 73, 89

The resultant outfile contains the following sorted by the values of the newly generated fourth column from highest to lowest:

New York, 95, 67, 2.418
Baltimore, 93, 69, 2.348
Tampa Bay, 90, 72, 2.25
Toronto, 73, 89, 1.82
Boston, 69, 93, 1.742

I forgot to mention that the infile("ALE.txt") is just the first 3 columns shown above. The one show is the intended format, just not sorted correctly. Is there a way to simply sort by the 4th column without changing my code too much? — , Oct 23 '17 at 03:10
I just edited the post. Let me know if the changes better reflect what you were going for. — Erick Shepherd, Oct 23 '17 at 03:27
Why not use CsvWriter instead of joining a list on commas anyway? — OneCricketeer, Oct 23 '17 at 03:40
@cricket_007 That would definitely be more flexible, but was trying to work within the given format. — Erick Shepherd, Oct 23 '17 at 03:54

score 0 · Answer 3 · answered Oct 23 '17 at 02:51

0

First, when handling this, it is preferable to use line.split(',').strip().

import csv
with open('ALE.txt', 'r') as infile:
    reader = csv.reader(infile)
    data = []
    for line in reader:
        formatted_line = [i.strip() for i in line]
        wins = int(formatted_line[2])
        percentage = 100*wins/total_wins
        formatted_line.append(str(round(percentage,3)))
        data.append(formatted_line)
    data = sorted(p, lambda x: x[3])
with open('ALE_sorted.txt', 'w', newline='') as outfile:
    writer = csv.writer(outfile)
    writer.writerows(data)

answered Oct 23 '17 at 02:51

N M

596
4
18

I just get the error "total wins" is not defined when trying to use this code – Oct 23 '17 at 03:18
@Choco Well, looking at the variables, can't you see that `wins` probably is the correct name? – OneCricketeer Oct 23 '17 at 03:38
Yes, that would be the correct variable, hence my confusion as to why you would write total_wins – Oct 23 '17 at 03:44
@ChocolateGoosePoosey I don't understand why you use 162/wins to get percentage. I simply included the wins/total_wins * 100 to get you to check your calculations. – N M Oct 23 '17 at 04:14
total_wins is still not defined. Sorry, I don't follow – Oct 23 '17 at 04:19
@ChocolateGoosePoosey To clarify, I meant the total number of games played. I apologize if that was unclear. What I meant was that wouldn't win percentage be 93/162*100. – N M Oct 23 '17 at 05:16

score 0 · Answer 4 · answered Oct 23 '17 at 03:57

The best way to sort the 4th column is to open your file using pandas. Here's how to do it:

import pandas as pd

outfile=pd.read_csv("ALE_sorted.txt")
column=outfile.columns.values.tolist()  # will give you the name of your column

#It will return [0L,1L,2L,3L] where 3L is your fourth column and refers to a long int.

outfile.sort_values(by=[3L])

print(outfile.3L)  # to see the sorted column

This will yield:

How would I sort an outputted text file by the nth column - Python

4 Answers4