0

I'm looking for a way to get the average marital status of each age:

For example, for people who are 34 years old the median martial status is Single, for 35 it is Single also and so on.

I group the dataframe

df_edad_estado_civil.groupby(['Estado_Civil', 'Edad'], as_index=False).mean() 

but it issues errors like:

DataError: No numeric types to aggregate

Basically, this is a part of my DataFrame:


    Edad    Estado_Civil
0   38  Soltero
1   26  Casado
2   26  Soltero
4   38  Soltero
5   24  Soltero
6   28  Soltero
7   30  Casado
8   32  Soltero
9   19  Soltero
10  28  Soltero
11  45  Casado
12  27  Soltero
13  41  Casado
14  45  Casado
15  38  Soltero

I need to get a list of a median status for every age like this:

years_old  status_mediam
  34         single
   .            .
  36          single
  37          married 
   38         married ....
   45         divorced and so on.
FChm
  • 2,515
  • 1
  • 17
  • 37

3 Answers3

0

convert the Estado_Civil column into a numeric type with 0 for married and 1 for single with something along this lines `df.Estado_Civil = df.Estado_Civil.apply(lambda x: 1*(x == 'single')). Then perform the groupby.

It also would be useful to select a particular column before applying the .mean(), e.g. df.groupby(['Estado_Civil', 'Edad'], as_index=False)['Estado_Civil'].mean()

hellmean
  • 121
  • 10
0

I think this data shows linear because people will get married as they get older. So, It can be represented by linear equation. Basically, It can be solved by linear regression model of machine learning. I think you are making some kind of machine learning model. Anyway, This is a sample code what I calculate each average of the marital status.

data = [[38, 'Soltero'],
        [26, 'Casado'],
        [26, 'Soltero'],
        [38, 'Soltero'],
        [24, 'Soltero'],
        [28, 'Soltero'],
        [30, 'Casado'],
        [19, 'Soltero'], 
        [28, 'Soltero'],
        [45, 'Casado'],
        [27, 'Soltero'],
        [41, 'Casado'],
        [45, 'Casado'],
        [38, 'Soltero']]

df_edad_estado_civil = pd.DataFrame(data, columns=list(['Estado_Civil', 'Edad']))
result = df_edad_estado_civil['Estado_Civil'].groupby(df_edad_estado_civil['Edad']).mean() 
print (df_edad_estado_civil)
print (result)

The result:

    Estado_Civil     Edad
0             38  Soltero
1             26   Casado
2             26  Soltero
3             38  Soltero
4             24  Soltero
5             28  Soltero
6             30   Casado
7             19  Soltero
8             28  Soltero
9             45   Casado
10            27  Soltero
11            41   Casado
12            45   Casado
13            38  Soltero
Edad
Casado     37.400000
Soltero    29.555556
yaho cho
  • 1,779
  • 1
  • 7
  • 19
0

It appears what you're looking for is the statistical mode, which is the most frequently occurring value:

df_edad_estado_civil.groupby('Edad')['Estado_Civil'].agg(pd.Series.mode)

See this answer for more details.

kev8484
  • 638
  • 1
  • 10
  • 17