0

Im trying this code

df_aggregated[['modeWinddirectiondiscrete','CAMPAÑA', 'ID_ESTACION','Month']].pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month')

Where the column 'modeWinddirectiondiscrete' have values like "North, West, South, East ..."

And I have as result just the index. I tried it with other numeric columns and works well so the problem is that the column has Strings. Is it possible to have a correct solution with the Strings? or I need to transform the directions to numbers?

EDIT: For example I have this table

mode  modeWinddirectiondiscrete  CAMPAÑA  ID_ESTACION  Month
    0                     South       16            0      1
    1                     North       16            0      1
    2                     North       16            0      1
    3                     East        16            0      2
    4                     West        16            0      2
    5                     East        16            0      2
    6                     South       17            1      3
    7                     West        17            1      3
    8                     North       17            1      3
    9                     West        17            1      3

And i want something like:

                                                   DewpointLocalDayAvg
              Month   1   10   11   12   2   3   4   5   6   7   8   9
CAMPAÑA ID_ESTACION
     16           0 North .............East...........................
     17           1 .......................West.......................

The result must be something like that, I don't know if this example is well understood.

  • It'd help to provide a [mre] with some example data and your desired output. For specifics see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Jul 17 '23 at 14:46
  • 1
    `pivot_table` does an aggregation if more than 1 row is found. If your datatype is string then it can not aggregate multiple values into one. – Matteo Zanoni Jul 17 '23 at 14:50

1 Answers1

1

What about this?

import pandas as pd

# toy dataset
data = {
    'modeWinddirectiondiscrete': ['North', 'South', 'East', 'West', 'North'],
    'CAMPAÑA': [1, 1, 2, 2, 3],
    'ID_ESTACION': [1, 2, 1, 2, 3],
    'Month': ['January', 'February', 'January', 'February', 'January']
}

df_aggregated = pd.DataFrame(data)

# turn modeWinddirectiondiscrete into categorical data (equivalent of factors in R)
df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['North', 'South', 'East', 'West'])

df_aggregated.pivot_table(index=['CAMPAÑA', 'ID_ESTACION'], columns='Month', values='modeWinddirectiondiscrete', aggfunc='first')
Month               February January
CAMPAÑA ID_ESTACION                 
1       1                NaN   North
        2              South     NaN
2       1                NaN    East
        2               West     NaN
3       3                NaN   North

Let me know if this doesn't work for the actual data, and I'll update it :-)

Update: ouroboros1 helpfully pointed out that the issue with your code is that pivot_table defaults to aggregating values using their mean, which doesn't make any sense for categorical data, as the categories don't correspond to numbers which can be averaged.

Read here for more information on data types and the operations which can be performed on them.

Mark
  • 7,785
  • 2
  • 14
  • 34
  • 2
    This doesn't really address the problem, which is that `aggfunc` defaults to `'mean'` (see the [docs](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html), which is a nonsensical operation for strings as well as categorical data. I.e., the above code works because you have passed an appropriate function to `aggfunc`, but it will work without `df_aggregated['modeWinddirectiondiscrete'] = pd.Categorical(df_aggregated['modeWinddirectiondiscrete'], categories=['North', 'South', 'East', 'West'])` as well. – ouroboros1 Jul 17 '23 at 15:01
  • 1
    @ouroboros1 thanks for the tip! I was translating in my head what I would do for R, hence the comments and the needlessly complicated way of doing it – Mark Jul 17 '23 at 15:03
  • 1
    @ouroboros1 updated! :D – Mark Jul 17 '23 at 15:10