3

I'm using Pandas to come up with new column that will search through the entire column with values [1-100] and will count the values where it's less than the current row.

See [df] example below:

[A][NewCol]
 1 0
 3 2
 2 1
 5 4
 8 5
 3 2

Essentially, for each row I need to look at the entire Column A, and count how many values are less than the current row. So for Value 5, there are 4 values that are less (<) than 5 (1,2,3,3).

What would be the easiest way of doing this?

Thanks!

Cesar
  • 617
  • 3
  • 8
  • 17
  • seems like what you are looking for is your custom version of the rank(A) - checkout https://pandas.pydata.org/pandasdocs/stable/reference/api/pandas.DataFrame.rank.html to see if you can achieve the result from the rank function as that would be the cleanest implementation here – marwan Mar 29 '19 at 20:02

6 Answers6

7

One way to do it like this, use rank with method='min':

df['NewCol'] = (df['A'].rank(method='min') - 1).astype(int)

Output:

   A  NewCol
0  1       0
1  3       2
2  2       1
3  5       4
4  8       5
5  3       2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
7

I am using numpy broadcast

s=df.A.values
(s[:,None]>s).sum(1)
Out[649]: array([0, 2, 1, 4, 5, 2])

#df['NewCol']=(s[:,None]>s).sum(1)

timing

df=pd.concat([df]*1000)

%%timeit
s=df.A.values
(s[:,None]>s).sum(1)
10 loops, best of 3: 83.7 ms per loop
%timeit (df['A'].rank(method='min') - 1).astype(int)
1000 loops, best of 3: 479 µs per loop
BENY
  • 317,841
  • 20
  • 164
  • 234
1

Try this code

A = [Your numbers]
less_than = []
    for element in A:
        counter = 0
        for number in A:
            if number < element:
                counter += 1
        less_than.append(counter)
SdahlSean
  • 573
  • 3
  • 13
1

You can do it this way:

import pandas as pd

df = pd.DataFrame({'A': [1,3,2,5,8,3]})

df['NewCol'] = 0
for idx, row in df.iterrows():
    df.loc[idx, 'NewCol'] = (df.loc[:, 'A'] < row.A).sum()

print(df)
   A  NewCol
0  1       0
1  3       2
2  2       1
3  5       4
4  8       5
5  3       2
Nathaniel
  • 3,230
  • 11
  • 18
1

Another way is sort and reset index:

m=df.A.sort_values().reset_index(drop=True).reset_index()
m.columns=['new','A']
print(m)

   new  A
0    0  1
1    1  2
2    2  3
3    3  3
4    4  5
5    5  8
anky
  • 74,114
  • 11
  • 41
  • 70
1

You didn't specify if speed or memory usage was important (or if you had a very large dataset). The "easiest" way to do it is straightfoward: calculate how many are less then i for each entry in the column and collect those into a new column:

df=pd.DataFrame({'A': [1,3,2,5,8,3]})
col=df['A']
df['new_col']=[ sum(col<i) for i in col ]

print(df)

Result:

   A  new_col
0  1        0
1  3        2
2  2        1
3  5        4
4  8        5
5  3        2

There might be more efficient ways to do this on large datasets, such as sorting your column first.

SolverWorld
  • 605
  • 1
  • 5
  • 12