0

Values will be given as a max() from a pandas data frame. For each item, I would like to get a rounded max value to create y-ticks for a matplot plot with the number of ticks = 10.

The data frame I am using is the official John Hopkins Covid Data. The preceding code returns the data frames categorized by Countries or States, Daily totals or cumulative, cases or deaths.

I have written code in the for loop that will convert the max, which could be over 20 million or as low as 6, to get the leading digit and add 1, then concatenate extra zero's if needed. I would rather have a value rounded down if the next digit is small, as this code creates small gaps at the top of some charts.

is the code I wrote that converts back and forth between str and int pythonic? Is there a simple way to add a round method to that code? or Is there just a better, more efficient way to do what I'm trying to do?

# Per Capita ## (identical version for daily totals on dfs1)
cumulative2 = dfs2.T[default[ind]]
daily_cases2 = cumulative2.diff()
d_max2 = daily_cases2.max().max()
c_max2 = cumulative2.max().max()

...

plot1 = daily_cases1.plot(kind='area', stacked=False, ax=ax1, lw=2, ylim=(0, d_max1))
plot2 = daily_cases2.plot(kind='area', stacked=False, ax=ax2, lw=2, ylim=(0, d_max2))
plot3 = cumulative1.plot(kind='area', stacked=False, ax=ax3, lw=2, ylim=(0, c_max1))
plot4 = cumulative2.plot(kind='area', stacked=False, ax=ax4, lw=2, ylim=(0, c_max2))

plots = [plot1, plot2, plot3, plot4]
maxes = [d_max1, d_max2, c_max1, c_max2]
for i, plot in enumerate(plots):
    rnd_max = int(f'{str(int(str(int(maxes[i]))[0]) + 1) + "0" * (len(str(int(maxes[i]))) - 1)}')
    yticks = np.arange(0, rnd_max, 1 if rnd_max < 10 else rnd_max // 10)
    ytick_labels = pd.Series(yticks).apply(lambda value: f"{int(value):,}")
    plot.set_yticks(yticks)
    plot.set_yticklabels(ytick_labels)

EDIT: The leading value I would like to be 3 if the value is 2,750,00 or 4 if the value is 41. So not a true base 10 return. but base 10 of with the leading digit.

cumulative:

State    California  Arizona  Florida  New York    Texas  Illinois
11/4/20      950920   250633   821123    519890  1003342    443803
3/14/20         372       12       76       557       60        64
5/22/20       90281    15624    49451    360818    53817    105444

daily:

State    California  Arizona  Florida  New York    Texas  Illinois
4/3/20       1226.0    173.0   1260.0   10675.0    771.0    1209.0
6/25/20      5088.0   3091.0   5004.0     814.0   5787.0     894.0
11/3/20      4990.0   1679.0   4637.0    2069.0   9721.0    6516.0

c_max and d_ max are just lists of floats/ints (equal to max value of pd series being plotted) 63817.0

2675262

Here's an output of a series of plots. You can see the first graph ticks go much higher than the actual max value of the first chart (ignore plot placement it's on the best fit for now). This is the result of rounding a low number high which I would like to alleviate. But the goal is to give the cleanest tick value I can while keeping the plots nice and tight

1 of a series of plots

Mr. T
  • 11,960
  • 10
  • 32
  • 54
AgentJRock
  • 43
  • 5
  • 1
    "The data frame I am using is the official John Hopkins Covid Data." Please include that data in your question. To make this quesiton answerable, youu need to include `daily_cases1`, `daily_cases2`, `cumulative1` and `cumulative2` in your question. Please see how to create a minimum reproducible example here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Jan 11 '21 at 20:52
  • Thanks for the tip. I wanted to include it but the length of the code I used to clean up, add, drop, and download the data is quite long though pretty easy to navigate. It's 100+ lines of code. I will gladly edit the question and add it if you think I should. – AgentJRock Jan 11 '21 at 21:38
  • I would include a sample dataframe of the output of the four dataframes (like 10 rows or so prior to plotting). You can randomly select n rows with this answert: https://stackoverflow.com/a/32606673/6366770 and then do `daily_cases1.head(10)` etc. for each dataframe. – David Erickson Jan 11 '21 at 21:40
  • I've edited, is this what you were lokking for? – AgentJRock Jan 11 '21 at 21:57
  • Almost, now include the expected output of the data and rounding. It is not completely clear how you want to round. – David Erickson Jan 11 '21 at 22:04
  • I added an image though not sure how to make it inline, and then explained a little more in the edit – AgentJRock Jan 11 '21 at 22:17

3 Answers3

0

If you really want just one significant digit for your 10 steps, you can replicate your (no, not really Pythonic I would say) string-converting expression with something that uses the base-10 logarithm, e.g.

def round10(n):
  return 10**math.ceil(math.log10(n))

But as you have yourself noticed, this doesn't really produce useful results, for example if the maximum value is 1001, the y ticks would go from 0 to 10000, meaning everything would basically be squeezed to the nearest tick. The built in autoscaling is more sophisticated and maximizes the usable area.

Krumelur
  • 31,081
  • 7
  • 77
  • 119
  • Thank you for your response. I found a similar way like that which is a little closer to what I needed. I guess I wasn't clear as I can't think of the words, but Im not looking for the true base 10, just that my range of answers were a wide range of base 10 levels. I need the leading digit still to be in a range of 0-9. just it rounded based on the first two digits – AgentJRock Jan 11 '21 at 21:42
0
from math import floor, log
def round_first(x):
    p = 10**floor(log(x,10))
    return (round(x/p)*p)
>>> round_first(5123)
5000
>>> round_first(5987)
6000
>>>

Edit: If you care about performance , then put all your data in as a numpy arrays and do a vectorized approach. The code below is vectorized and also doesn't choke on zero or negative numbers.

import numpy as np
>>> def round_first(x):                                 
...     xa = np.abs(x)                                  
...     xs = np.sign(x)                                 
...     nonzero = x!=0                                  
...     p=10**np.floor(np.log10(xa[nonzero]))           
...     out=np.zeros(x.shape)
...     out[nonzero] = np.round(xa[nonzero]/p)*p*xs[nonzero]
...     return out                                      
...
>>> x = np.arange(-1000,2001,67)                        
>>> x
array([-1000,  -933,  -866,  -799,  -732,  -665,  -598,  -531,  -464,
        -397,  -330,  -263,  -196,  -129,   -62,     5,    72,   139,
         206,   273,   340,   407,   474,   541,   608,   675,   742,
         809,   876,   943,  1010,  1077,  1144,  1211,  1278,  1345,
        1412,  1479,  1546,  1613,  1680,  1747,  1814,  1881,  1948])
>>> round_first(x)
array([-1000.,  -900.,  -900.,  -800.,  -700.,  -700.,  -600.,  -500.,
        -500.,  -400.,  -300.,  -300.,  -200.,  -100.,   -60.,     5.,
          70.,   100.,   200.,   300.,   300.,   400.,   500.,   500.,
         600.,   700.,   700.,   800.,   900.,   900.,  1000.,  1000.,
        1000.,  1000.,  1000.,  1000.,  1000.,  1000.,  2000.,  2000.,
        2000.,  2000.,  2000.,  2000.,  2000.])

Also your question says round nearest (you say 41 becomes 40 instead of 50), but your self answer to yourself uses a ceil(), which would make 41 go to 50.

Matt Miguel
  • 1,325
  • 3
  • 6
  • 1
    Thank you!!! This is exactly what I was looking for – AgentJRock Jan 11 '21 at 22:26
  • @AgentJRock Awesome - happy to help. Please mark this as the accepted answer when you have time. Thanks! – Matt Miguel Jan 12 '21 at 01:32
  • Yeah the graph was coming up short with floor. So I just switched it to work like my abomination of a code worked from your code. How can I hit accept. I looked earlier – AgentJRock Jan 12 '21 at 06:36
  • Your code did exactly what I asked for just to be clear – AgentJRock Jan 12 '21 at 06:36
  • @AgentJRock thanks, just click the checkmark besides the answer to toggle it from grayed out to filled in. – Matt Miguel Jan 12 '21 at 06:50
  • Awesome, and if anyone wants to help me out. This was my first post and it got a down. I'm just trying to get to liking posts and making comments so any help is appreciated! – AgentJRock Jan 12 '21 at 06:56
0
def round10_first(x):
    from math import floor, ceil, log
    p = 10 ** floor(log(x, 10))
    return ceil(x / p) * p

Thank you guys for your help. I actually combined your answers for my solution I ran a timeit on them and they are both the same speed but I will use the one built from yours to be more pythonic

%timeit -n 10000000 function1
%timeit -n 10000000 function2

16.7 ns ± 0.108 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
16.8 ns ± 0.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
AgentJRock
  • 43
  • 5
  • If you are really interested in performance you should vectorize everything with numpy arrays. Let me know if you want me to modify my answer to do that. – Matt Miguel Jan 12 '21 at 02:40