1

I have a csv of daily maximum temperatures. I am trying to assign a "rank" for my data. I first sorted my daily maximum temperature from lowest to highest. I then created a new column called rank.

#Sort data smallest to largest
ValidFullData_Sorted=ValidFullData.sort_values(by="TMAX")
#count total obs
n=ValidFullData_Sorted.shape[0]
#add a numbered column 1-> n to use in return calculation for rank
ValidFullData_Sorted.insert(0,'rank',range(1,1+n))

How can I make the rank the same for values of daily maximum temperature that are the same? (i.e. every time the daily maximum temperature reaches 95° the rank for each of those instances should be the same)

Here is some sample data:(its daily temperature data so its thousands of lines long)

Date    TMAX  TMIN
1/1/00  22    11
1/2/00  26    12
1/3/00  29    14
1/4/00  42    7
1/5/00  42    21

And I want to add a TMAXrank column that would look like this:

Date    TMAX  TMIN  TMAXRank
1/1/00  22    11    4
1/2/00  26    12    3
1/3/00  29    14    2
1/4/00  42    7     1
1/5/00  42    21    1
Soviut
  • 88,194
  • 49
  • 192
  • 260
Megan Martin
  • 221
  • 1
  • 9
  • The temperature itself is the rank for all intents and purposes? – roganjosh Dec 03 '18 at 23:59
  • 1
    see the pandas [`DataFrame.rank`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rank.html) and [`Series.rank`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rank.html) methods – root Dec 03 '18 at 23:59
  • See if this is [helpful](https://stackoverflow.com/questions/23279238/custom-sorting-with-pandas). If you’re only working with a file and know where the rank for values >95 F fall, you could use an if then statement to assign rank manually. Without looking the data its kind of tricky. – pizza lover Dec 04 '18 at 00:25
  • @roganjosh I need actual rankings because I want to look at probabilities – Megan Martin Dec 04 '18 at 01:25
  • @KenDekalb unfortunately I don't know where they are – Megan Martin Dec 04 '18 at 01:26
  • I am going to attach the data so you can have a better idea of what I mean – Megan Martin Dec 04 '18 at 01:26

1 Answers1

0
ValidFullData['TMAXRank'] = ValidFullData[ValidFullData['TMAX'] < 95]['TMAX'].rank(ascending=False, method='dense')

Output:

    Unnamed: 0  TMAX  TMIN  TMAXRank
17          17    88    14       1.0
16          16    76    12       2.0
15          15    72    11       3.0
14          14    64    21       4.0
8            8    62     7       5.0
7            7    58    14       6.0
13          13    58     7       6.0
18          18    55     7       7.0
3            3    42     7       8.0
4            4    42    21       8.0
6            6    41    12       9.0
12          12    37    14      10.0
5            5    36    11      11.0
2            2    29    14      12.0
1            1    26    12      13.0
0            0    22    11      14.0
9            9    98    21       NaN
10          10   112    11       NaN
11          11    98    12       NaN
19          19    95    21       NaN
Conner
  • 30,144
  • 8
  • 52
  • 73
  • How is this a valid answer? It does not assign same rank to rows when Temperature values are greater than 95 F. – pizza lover Dec 04 '18 at 03:49
  • @KenDekalb not sure what you mean. See my edits above. It ranks 98 as 2 in both instances just fine. – Conner Dec 04 '18 at 03:58
  • @KenDekalb Ah, I didn't see that in the question. I've edited my answer to accomodate. – Conner Dec 04 '18 at 16:41