0

I can't find a way to create a loop that extracts the values in the top 20 percentile of a data frame's first row and stores the associated column header in a list name, for example, "top_2017".

it should do this for every row and create a different list per row, but I cannot make the "left part of the equal" dynamic.

I tried different loops with "i" but, once I write it in the left part of the equation, it is recognised as a simple letter

enter image description here

  • Please always post your code as text, not using screenshots. See [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/). Also, please include a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a sample of your data, your current result, and your desired output. Also see [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391). – AlexK Mar 07 '23 at 02:53

1 Answers1

0

Preparing the data:

import pandas as pd
from random import sample
data = [sample(range(i * 100, (i + 1) * 100), 100) for i in range(10)]

df = pd.DataFrame(data)
df

first output: the DataFrame enter image description here

matriz = []
for _, ro in df.iterrows(): # row
    lista = list(enumerate(ro)) # position, value
    matriz.append(sorted(lista, key = lambda x: x[1], reverse = True)) # # top in beginning positions

top20 = len(matriz[0]) // 5 # top 20% each row
lista, lista20 = [], [] # prepare results
for ro in matriz: # row
    for p, v in ro[:top20]: # position, value
        lista.append(p) # sure of bigger values, get its position
    lista20.append(lista) # ready the row, you may append it
    lista = [] # reinitializes the row
print(lista20)

The output: enter image description here