1

I'm working on a small project that i've gotten stuck on. I've used python to take a list of timestamps and heatmap data and separate them by line (always 1-100). I am aware of the max() option, but to the best of my knowledge exhausted google and stack overflow attempting to include the line number and multiple max numbers descending.

Here is a sample of the csv I am working with:

0.00088006474088529383
0.00015301444453664169
0.0001578056486084342
4.8472783963609083e-05
0.00018440120509040085
7.766234473424159e-05

What I would ideally need is a list of the 20 biggest numbers' lines in the csv, for example:

6
4
1
5
3
2

I'm unsure how to start this, but I have experimented with:

with open('heatmap.csv', 'r') as heatnum:
 for line in heatnum:
     print(max(heatnum))

This unfortunately only gives me the singular max number, I'm unsure where to start receiving descending max numbers up to 20, and how to output line number.

theboy
  • 353
  • 2
  • 10
  • 1
    Finding top k among n elements is a well-known algorithm. [Top K Frequent Words using heaps in Python](https://stackoverflow.com/questions/64778567/top-k-frequent-words-using-heaps-in-python) – Abhijit Sarkar Aug 14 '23 at 02:53
  • 1
    Your example output doesn't actually correspond to your example input, right? The order is different. Descending order would be `[1 5 3 2 6 4]`. – Reinderien Aug 14 '23 at 05:00

1 Answers1

3

Use Numpy. Pandas is overkill, and bare Python is possible but inconvenient.

First generate test data:

import numpy as np
from numpy.random import default_rng

rand = default_rng(seed=0)
np.savetxt(fname='heatmap.csv', X=rand.uniform(low=1e-5, high=1e-3, size=100))

Then load, sort and slice:

heatnum = np.loadtxt(fname='heatmap.csv')
top_lines = 1 + heatnum.argsort()[-1:-21:-1]
Reinderien
  • 11,755
  • 5
  • 49
  • 77