1

I've been looking for this error here, but I only found this solution 1 (which doesn't work in my case). Does anybody can guide me on how to solve it?.

My dataset (df2) looks like this:

      id_cl  id_sup total_t cl_ind  cl_city  sup_ind  sup_city  same_city
0   1000135 1797029  414.85  I5610  11308.0    G4711   10901.0   no
1   1000135 1798069  19.76   I5610  11308.0    G4719   10901.0   no
2   1000135 1923186  302.73  I5610  11308.0    G4630   10901.0   no
3   1000135 2502927  1262.86 I5610  11308.0    G4630   11308.0   yes
4   1000135 2504288  155.04  I5610  11308.0    G4711   11308.0   yes

I need to group this dataset as follows:

df_sup = df2.groupby(['cl_city','cl_ind','same_city']).agg({'id_sup':'nunique', 'total_t':'sum'})

But when performing this, I'm getting this error!:

ValueError: Grouper for 'cl_city' not 1-dimensional

As a result I need something like this:

                                 id_sup      total_t
cl_city     cl_ind  same_city       
  10701      A0112         no         2    21964.22
                          yes        31     3530.40
             A0122         no      2374 23328061.47
                          yes      1228  2684408.12
             A0127         no        11    19962.68
                          yes         7      915.44
             A0163         no       357   574827.97
                          yes       140     60385.7
PAstudilloE
  • 659
  • 13
  • 24
  • What is the output of `df2.columns`? – Peter Leimbigler Nov 21 '18 at 22:47
  • as an output I need to know: how many unique 'id_sup' and the sum of 'total_t' of every item of cl_city, cl_ind and same_city. Basically a table with columns: cl_city | cl_ind | same_city | count of unique id_sup | sum of total_T – PAstudilloE Nov 21 '18 at 22:54
  • Possible duplicate of https://stackoverflow.com/questions/43298192/valueerror-grouper-for-something-not-1-dimensional – Kevin Fang Nov 21 '18 at 22:58
  • I mean, could you please post the exact text output when you run `df2.columns` – Peter Leimbigler Nov 21 '18 at 22:58
  • @PAstudilloE, we just need to see the full list of column names in your actual DataFrame. The most likely source of this error message is if you have multiple columns with the same name. – Peter Leimbigler Nov 21 '18 at 22:59
  • @PeterLeimbigler no, there's not multiple columns. It's exactly as I posted. I solved the issue by converting the df2 to a csv file and then loading it again. How I get to df2 file is by merging and grouping several dataset. Apparently, it's an issue that python is reading df2 as multi-index, but it's not multi-index. =( – PAstudilloE Nov 21 '18 at 23:07
  • @PAstudilloE, glad to hear you solved the issue. The DataFrame sample that you posted does *not* have multiple levels in either the columns or index. Did you delete something from above the column names? Again, if you would just paste the exact output of `df2.columns`, that would reveal the presence of a MultiIndex. – Peter Leimbigler Nov 21 '18 at 23:09
  • @PeterLeimbigler this is what it shows me when I run df2.columns: MultiIndex(levels=[[u'cl_city', u'cl_ind', u'id_client', u'id_supr', u'sup_city', u'sup_industry', u'total_t', u'same_city']], labels=[[2, 3, 6, 1, 0, 5, 4, 7]]) – PAstudilloE Nov 21 '18 at 23:11
  • 1
    @PAstudilloE, thanks! This does reveal the immediate cause of the error: for some terrible reason, the columns of `df2` are a highly nested MultiIndex, where instead of each string being a *label*, each string is actually the name of a separate *index level*(!) The fix for this comes from https://stackoverflow.com/q/14507794, and is this: `df.columns = [' '.join(col).strip() for col in df.columns.values]`. You mentioned `df2` is the result of many merges; I suspect those merges are not written optimally, leading to this pathological MultiIndex situation. – Peter Leimbigler Nov 21 '18 at 23:21
  • 1
    Thanks @PeterLeimbigler!! That's exactly what happened. I solved the issue as you recommended. – PAstudilloE Nov 21 '18 at 23:26

1 Answers1

0

I don't know why python is showing me that error, df2 is the result of merging several previous datasets and it does not have any duplicate columns.

I solve this issue in a silly way but it worked. I converted df2 to a CSV file and then I load it again. After that, everything is working fine. [But I can't figure out, why python is showing that error]. Hope it helps.

PAstudilloE
  • 659
  • 13
  • 24
  • I also get this issue with merging several DFs of identical schema, and it seems to depend on either the number of rows or the number of DFs. e.g. `df.append([df_pt2, df_pt3, df_pt4, df_pt5, df_pt6], ignore_index=True)` works fine, but `df.append([df_pt2, df_pt3, df_pt4, df_pt5, df_pt6, df_pt7], ignore_index=True)` fails. – Brendan Jul 21 '20 at 21:01