Pandas: How assign to a new DF values in quantiles, using greater than and smaller than?

Question

I am new to coding, and my English isn't that good so please be patient with me =D

This is the main DF (df_mcred_pf). I posted all data and code in full below.

From the main DF, I created a DF with all values from the 1st quantile and it worked:

df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)

Now I need to create a new DF with the values of the 2nd quantile: all values greater than the values of the 1sq quantile (vQ1_mcred_pf) and smaller than the values of the 2nd quantile (vQ2_mcred_pf). I tried this but it didn't work:

df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]

I got this error: TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

And I'm stuck here. Could you help me, please?

Full code here:

import pandas as pd
import numpy as np
    
df_mcred_pf = pd.DataFrame([[2, 12, "F", 1, 1, 12.55, 437],
[2, 12, "F", 1, 1, 17.81, 437],
[2, 12, "F", 1, 1, 18.14, 437],
[2, 12, "F", 1, 1, 20.43, 437],
[2, 12, "F", 1, 1, 21.19, 437],
[2, 12, "F", 1, 1, 22.73, 437],
[2, 12, "F", 1, 1, 23.73, 437],
[2, 12, "F", 1, 1, 25.26, 437],
[2, 12, "F", 1, 1, 25.34, 437],
[2, 12, "F", 1, 1, 26.02, 437],
[2, 12, "F", 1, 1, 26.78, 437],
[2, 12, "F", 1, 1, 26.79, 437],
[2, 12, "F", 1, 1, 26.83, 437],
[2, 12, "F", 1, 1, 27.59, 437],
[2, 12, "F", 1, 1, 27.83, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.32, 437],
[2, 12, "F", 1, 1, 28.83, 437],
[2, 12, "F", 1, 1, 29.08, 437],
[2, 12, "F", 1, 1, 29.13, 437],
[2, 12, "F", 1, 1, 29.33, 437],
[2, 12, "F", 1, 1, 29.84, 437],
[2, 12, "F", 1, 1, 29.85, 437],
[2, 12, "F", 1, 1, 30.36, 437],
[2, 12, "F", 1, 1, 30.62, 437],
[2, 12, "F", 1, 1, 30.87, 437],
[2, 12, "F", 1, 1, 31.38, 437],
[2, 12, "F", 1, 1, 31.39, 437],
[2, 12, "F", 1, 1, 31.89, 437],
[2, 12, "F", 1, 1, 32.92, 437]], columns=['cd_mod_pri', 'cd_mod_sec', 'id_tp_pes', 'cd_idx_pri', 'cd_idx_sec', 'vr_tx_jrs', 'quantidade'])
    


MAX_mcred = df_mcred_pf['vr_tx_jrs'].max()    

MIN_mcred = df_mcred_pf['vr_tx_jrs'].min()
    
vQ1_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.25)
vQ2_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.50)
vQ3_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(0.75)
vQ4_mcred_pf = df_mcred_pf['vr_tx_jrs'].quantile(1.00)

df_mcred_pf_Q1 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100)]
df_mcred_pf_Q1.head(30)

MEDIAN_mcred = df_mcred_pf_Q1["vr_tx_jrs"].median()

df_mcred_pf_Q2 = df_mcred_pf[df_mcred_pf['vr_tx_jrs']>np.quantile(df_mcred_pf['vr_tx_jrs'], vQ1_mcred_pf/100) & df_mcred_pf['vr_tx_jrs']<=np.quantile(df_mcred_pf['vr_tx_jrs'], vQ2_mcred_pf/100)]

I'm sorry, I tried but I don't know how to do it. I tried to post in HTML but it didn't work. — Nilton, Jan 07 '21 at 15:25
`print(df.head(10))` - copy/paste? More options are described in the thread [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — Mr. T, Jan 07 '21 at 15:37
Hey, buddy. Copy and paste don't work, the table gets crazy. So I typed all the code to build the DF in the "full code section" of my question. Thanks a lot for the tips and I'm really sorry for that. Could you help, me please, about my question? Thanks and have a great day! — Nilton, Jan 07 '21 at 16:27
Strange, why pasting the table should be a problem. But I appreciate your effort and gave it a go, although me and pandas .... well, we are not the best friends. — Mr. T, Jan 07 '21 at 17:19

Mr. T · Accepted Answer · 2021-01-07T17:33:29.827

1

I would address this problem differently and create a column with a quantile descriptor:

import pandas as pd
import numpy as np
    
#your dataframe here
    
quant = [0, .25, .5, .75, 1]
s = df_mcred_pf["vr_tx_jrs"].quantile(quant)

df_mcred_pf["Quartil"] = pd.cut(df_mcred_pf["vr_tx_jrs"], s, include_lowest=True, labels=["Q1", "Q2", "Q3", "Q4"])

This returns the following output:

    cd_mod_pri  cd_mod_sec id_tp_pes  ...  vr_tx_jrs  quantidade  Quartil
0            2          12         F  ...      12.55         437     Q1
1            2          12         F  ...      17.81         437     Q1
2            2          12         F  ...      18.14         437     Q1
3            2          12         F  ...      20.43         437     Q1
4            2          12         F  ...      21.19         437     Q1
5            2          12         F  ...      22.73         437     Q1
6            2          12         F  ...      23.73         437     Q1
7            2          12         F  ...      25.26         437     Q1
8            2          12         F  ...      25.34         437     Q2
9            2          12         F  ...      26.02         437     Q2
10           2          12         F  ...      26.78         437     Q2
...
28           2          12         F  ...      31.89         437     Q4
29           2          12         F  ...      32.92         437     Q4

[30 rows x 8 columns]

Now, you can filter the dataframe by quartile:

print(df_mcred_pf[df_mcred_pf["Quartil"]=="Q2"])

You can also choose to code the quartile as a number, e.g.,

labels=range(len(quant)-1)

Then, you could get quartiles up to 0.75 with

print(df_mcred_pf[df_mcred_pf["Quartil"]<3])

Maybe there are easier ways to achieve this, let's see what other people will come up with.

edited Jan 07 '21 at 17:33

answered Jan 07 '21 at 17:00

Mr. T

11,960
10
32
54

Hey, buddy. Thanks again! I tried but I'm getting the following error: :4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead. Do you know whats happening? I adress the question like I did, because after I get all the Q1, Q2, Q3 and Q4 (dynamicly, because the data can change), I need to get the median of each "Q". – Nilton Jan 07 '21 at 17:18
1

No idea why iPython would say this, and I have no experience with iPython. No error message in my environment. What line of my code causes this warning? – Mr. T Jan 07 '21 at 17:23
this line: df_mcred_pf["Quartil"] = pd.cut(df_mcred_pf["vr_tx_jrs"], s, include_lowest=True, labels=["Q1", "Q2", "Q3", "Q4"]) – Nilton Jan 07 '21 at 17:23
1

Not `df_mcred_pf[1, "Quartil"] `, `df_mcred_pf["Quartil"] ` it is. We define a new column. Your version, of course, would raise this warning because you try to set just one cell value. – Mr. T Jan 07 '21 at 17:24
Didn't work either. I put 1 to test too, because o the error that I print to you here. But it didn't work nor this way [1, "Quartil"] , nor this way ["Quartil"]. Both same error – Nilton Jan 07 '21 at 17:27
1

Unclear why this would happen. What about `df_mcred_pf.loc[:, "Quartil"] = pd.cut....`, then? – Mr. T Jan 07 '21 at 17:30
Hey, buddy! It worked, thanks so much. Actually, your first code worked, but in jupyter notebook, it was the only code that I saw that says there's an error but, even son, the code runs from behind. A tip to help others: your code changes all datatypes to objects and quartil to categorical, I changed all back. Thanks and have a great day! – Nilton Jan 07 '21 at 18:40
1

Of course, `Quartil` is categorical because we defined it as such. The dtypes at the end of the script are `cd_mod_pri int64 cd_mod_sec int64 id_tp_pes object cd_idx_pri int64 cd_idx_sec int64 vr_tx_jrs float64 quantidade int64 Quartil category`, so again I do not know what iPython does there. If this answer solved your problem, please consider [marking it as accepted](https://stackoverflow.com/help/accepted-answer). – Mr. T Jan 07 '21 at 18:48
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/227044/discussion-between-nilton-and-mr-t). – Nilton Jan 08 '21 at 15:05

Pandas: How assign to a new DF values in quantiles, using greater than and smaller than?

1 Answers1

Linked