0

Updated:

I have a huge dataframe, providing small version of it.

header = [np.array([' ',' ',' ','X','X','Y','Y']),
         np.array(['A','B','C','D','E','F','G'])]
df = pd.DataFrame(columns=header)
df[' ','A'] = ['n','n','m','m','m','p']
df[' ','B'] = ['q','r','s','t','u','v']
df[' ','C'] = [5,6,7,8,9,4]
df['X','D'] = ['1.5','2.9','3.6','2.5','7.1','0.4']
df['X','E'] = ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%']
df['Y','F'] = ['ab','bc','cd','de','ef','gh']
df['Y','G'] = ['5.5','2.6','8.6','4.5','0.1','3.4']

df =df.style.hide_index()

In real, 'B' is getting dynamically generated from another dataframe and depending on value of 'B', 'A' is being populated manually.

I want to group my dataframe on column 'A' and sort the dataframe on column 'A' too

I tried this code:

def func(x):
   return x.sort_values([('','A')],ascending=False)

dfResult = df.groupby([('','A')])
dfResult1 = dfResult.apply(func)
dfResult1

     |   |   |   |   |  X        |    Y 
(,A) |   | A | B | C |  D |  E   |    F | G
-----|---|---|---|---|----|------|------|----
  n  | 0 | n | q | 5 |1.5 | 0.7% |   ab | 5.5
     | 1 | n | r | 6 |2.9 | 3.9% |   bc | 2.6
--- -|---|---|---|---|----|------|------|----
  m  | 2 | m | s | 7 |3.6 | 3.2% |   cd | 8.6
     | 3 | m | t | 8 |2.5 | 1.5% |   de | 4.5
     | 4 | m | u | 9 |7.1 | 4.1% |   ef | 0.1
-----|---|---|---|---|----|------|------|---- 
  p  | 5 | p | v | 4 |0.4 | 2.4% |   gh | 3.4

Expected output:

dfExpected = pd.DataFrame({(' ','C'): [5,6,7,8,9,4],
                    ('X', 'D'): ['1.5','2.9','3.6','2.5','7.1','0.4'],
                    ('X', 'E'): ['0.7%','3.9%','3.2%','1.5%','4.1%','2.4%'],
                    ('Y', 'F'): ['ab','bc','cd','de','ef','gh'],
                   ('Y', 'G'): ['5.5','2.6','8.6','4.5','0.1','3.4']},
              index=pd.MultiIndex.from_arrays([['n','n','m','m','m','p'],
                                               ['q','r','s','t','u','v']], 
                                              names=['A', 'B']))

printing dfResult1 does not gives me the desired dataframe.

Also when I am applying styles on dfResult1, the grouping doesn't not exists anymore, it's taking the form of original dataframe after applying styles. I need to apply styles on my dataframe for the dashboard.

can anyone pls help?

Maleficent
  • 57
  • 8
  • Please [do not post images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) of your data. You can include [code that creates a dataframe or the output of `print(df)`](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (or of a few rows and columns that allow to reproduce the example) – Cimbali Jul 07 '21 at 13:19
  • Assuming (a,b) is a multiindex, there is no difference between your two dataframes in python. Unless you want to replace the values with empty strings? If this is not what you want, please provide your input/output data as text. – mozway Jul 07 '21 at 13:27
  • actually i have multi-indexed columns which i was not able to form in table here, so provided image – Maleficent Jul 07 '21 at 13:27
  • @mozway I have updated my question – Maleficent Jul 07 '21 at 14:18

1 Answers1

0

applicable answer

You can reset_index and rename_axis:

df.set_index([(' ', 'A'),(' ', 'B')]).rename_axis(['A', 'B'])

output:

          X         Y     
     C    D     E   F    G
A B                       
n q  5  1.5  0.7%  ab  5.5
  r  6  2.9  3.9%  bc  2.6
m s  7  3.6  3.2%  cd  8.6
  t  8  2.5  1.5%  de  4.5
  u  9  7.1  4.1%  ef  0.1
p v  4  0.4  2.4%  gh  3.4

previous answer

Your two dataframes are identical.

Here is how to create your dataframe:

pd.DataFrame({('x', 'c'): ['ab', 'bc', 'cd', 'ef', 'gh', 'ij', 'kl'],
              ('y', 'd'): [1.5, 2.3, 2.4, 1.2, 3.1, 5.2, 7.3]},
              index=pd.MultiIndex.from_arrays([[0,0,1,1,1,2,2],
                                               [5,6,7,8,9,3,4]], 
                                              names=['a', 'b'])
            )

output:

      x    y
      c    d
a b         
0 5  ab  1.5
  6  bc  2.3
1 7  cd  2.4
  8  ef  1.2
  9  gh  3.1
2 3  ij  5.2
  4  kl  7.3
mozway
  • 194,879
  • 13
  • 39
  • 75
  • sorry for the confusion, have corrected my question – Maleficent Jul 07 '21 at 14:17
  • Can you provide the input and output dataframes as `pd.DataFrame(…)` commands? This way the dataframe identities will be non ambiguous. See my answer on how to make the dataframes. – mozway Jul 07 '21 at 14:48
  • yes the dataframe is same, the way you have generated through command pd.Datframe() except the index 'b' is getting dynamically generated from another dataframe and depending on value of 'b', 'a' is being populated – Maleficent Jul 09 '21 at 06:26
  • As I said, can you use my way of generating the dataframe to produce your input and output dataframe and amend your question with this code? Then I (and others) can have a look to try solving your problem. See [this post on how to make a good pandas question](https://stackoverflow.com/a/20159305/16343464), especially when working with multiindexes. Right now, your dataframes are ambiguous and it is impossible to know what you want to do. – mozway Jul 09 '21 at 06:38
  • Yes, perfect, I provided an answer. In case you want to keep A/B columns in addition to the index you can use the `drop=False` option in `set_index` – mozway Jul 09 '21 at 15:31
  • thanks for the answer, i am able to get proper grouped table now as expected, but again after applying styles it is going away. this is some of the styles i am applying: `df = (df.style.format('{:,.0f%}', na_rep='-',subset=[col for col in df.columns if col[0] in ('A','B','C')])).apply(highlight_cells, axis = None)\ .set_table_styles([{'selector':'th','props':[('border','1px solid black'),('font-size','8pt')]}])` – Maleficent Jul 10 '21 at 13:37