0

I have dataframe which is having 170 columns but while doing groupby it is generating 8 columns only. I am not sure how it is happening. Could you please help me how to solve this?

Example:

print(list(df.columns))

['nameid', 'postal_code', 'responder_18_20_years', 'responder_21_24_years', 'responder_25_34_years', 'responder_35_44_years', 'responder_45_49_years', 'responder_50_54_years', 'responder_55_59_years', 'responder_60_64_years', 'responder_65_69_years', 'responder_70_74_years', 'responder_75_order', 'age_prefer_not_to_answer', 'main_responder_female', 'main_responder_male', 'gender_prefer_not_to_answer', 'marital_status_single', 'marital_status_married', 'household_number_people_1', 'household_number_people_2', 'household_number_people_3', 'household_number_people_4_more', 'household_infant_0_2_months_female', 'household_infant_0_2_months_male', 'household_infant_3_4_months_female', 'household_infant_3_4_months_male', 'household_infant_5_6_months_female', 'household_infant_5_6_months_male', 'household_infant_7_12_months_female', 'household_infant_7_12_months_male', 'household_infant_13_23_months_female', 'household_infant_13_23_months_male', 'household_children_24_35_months_female', 'household_children_24_35_months_male', 'household_children_3_4_years_female', 'household_children_3_4_years_male', 'household_children_5_8_years_female', 'household_children_5_8_years_male', 'household_children_9_12_years_female', 'household_children_9_12_years_male', 'household_children_13_15_years_female', 'household_children_13_15_years_male', 'household_children_16_17_years_female', 'household_children_16_17_years_male', 'household_adult_18_20_years_female', 'household_adult_18_20_years_male', 'household_adult_21_24_years_female', 'household_adult_21_24_years_male', 'household_adult_25_34_years_female', 'household_adult_25_34_years_male', 'household_adult_35_44_years_female', 'household_adult_35_44_years_male', 'household_adult_45_49_years_female', 'household_adult_45_49_years_male', 'household_adult_50_54_years_female', 'household_adult_50_54_years_male', 'household_adult_55_59_years_female', 'household_adult_55_59_years_male', 'household_adult_60_64_years_female', 'household_adult_60_64_years_male', 'household_adult_65_69_years_female', 'household_adult_65_69_years_male', 'household_adult_70_74_years_female', 'household_adult_70_74_years_male', 'household_adult_75_over_years_female', 'household_adult_75_over_years_male', 'household_income_less_20k', 'household_income_20k_40k', 'household_income_40k_60k', 'household_income_60k_80k', 'household_income_80k_100k', 'household_income_100k_over', 'household_income_prefer_not_to_answer', 'plan_diet_to_lose_weight', 'plan_diet_to_more_natural_organic_food', 'plan_diet_to_more_plant_based_food', 'plan_diet_to_eat_gluten_free', 'supplements_reason_increase_fibre', 'supplements_reason_relieve_mucle_or_join_pain', 'food_shopping_habits_regularly_buy_new_food', 'food_shopping_habits_pay_more_for_better_quality', 'food_shopping_habits_food_convenient_to_use', 'household_use_adult_pain_relievers_yes', 'household_use_adult_pain_relievers_no', 'hemorrhoids_usual_brand_anusol', 'hemorrhoids_usual_brand_preparation_h', 'constipation_usual_brand_dulcolax', 'constipation_usual_brand_ex_lax', 'constipation_usual_brand_metamucil', 'constipation_usual_brand_restoralax', 'constipation_usual_brand_senokot', 'household_use_multivitamins_yes', 'household_use_specific_vitamins_yes', 'household_use_vitamins_no', 'rx_med_brand_name_importance_very_important', 'rx_med_brand_name_importance_somewhat_important', 'rx_med_brand_name_importance_not_important', 'rx_med_do_not_use', 'rx_med_pay_assist_pgm_innoviscares_savings_current_use', 'rx_med_pay_assist_pgm_innoviscares_savings_previously_use', 'rx_med_pay_assist_pgm_merck_savings_current_use', 'rx_med_pay_assist_pgm_merck_savings_previously_use', 'rx_med_pay_assist_pgm_pfizer_savings_current_use', 'rx_med_pay_assist_pgm_pfizer_savings_previously_use', 'rx_med_pay_assist_pgm_rxhelp_one_savings_current_use', 'rx_med_pay_assist_pgm_rxhelp_one_savings_previously_use', 'rx_med_pay_assist_pgm_other_savings_current_use', 'rx_med_pay_assist_pgm_other_savings_previously_use', 'rx_med_pay_assist_pgm_none', 'ailment_arthritis', 'ailment_chronic_pain', 'ailment_depression', 'ailment_dry_itchy_skin', 'ailment_heartburn', 'ailment_high_blood_pressure', 'ailment_high_cholesterol', 'ailment_muscle_aches_body_pain', 'ailment_osteoarthritis', 'ailment_poor_leg_circulation', 'ailment_rheumatoid_arthritis', 'household_drug_plan_private_company', 'household_drug_plan_government', 'household_drug_plan_does_not_apply', 'household_psoriasis_yes', 'household_psoriasis_no', 'travel_freq_4_more_trips_per_years_business', 'travel_freq_4_more_trips_per_year_personal', 'travel_freq_1_3_trips_per_year_business', 'travel_freq_1_3_trips_per_year_personal', 'travel_do_not_travel_business', 'travel_do_not_travel_personal', 'hobbies_interests_camping_hiking', 'hobbies_interests_casino_gambling', 'hobbies_interests_cigar_smoking', 'hobbies_interests_coin_collecting', 'hobbies_interests_collectables', 'hobbies_interests_education_career_advancement', 'hobbies_interests_environment_friendly', 'hobbies_interests_exercise_1_2_times_per_week', 'hobbies_interests_exercise_3_more_per_week', 'hobbies_interests_regular_buy_latest_fashion_trends', 'hobbies_interests_self_improvement_courses', 'hobbies_interests_sweepstakes_lotteris', 'hobbies_interests_theater_performing_arts', 'hobbies_interests_yoga_pilates', 'hobbies_interests_none_above', 'loyalty_reward_aeroplan', 'loyalty_reward_air_miles', 'loyalty_reward_hbc', 'loyalty_reward_pc_optimum', 'loyalty_reward_petro_points', 'loyalty_reward_sobeys', 'loyalty_reward_other', 'loyalty_reward_does_not_apply', 'language_english', 'language_french', 'index', 'fsa', 'region', 'age', 'income', 'gender', 'hemorrhoids_usual_brands', 'constipation_usual_brands', 'household_use_vitamins', 'comorbidities_count', 'comorbidities_1_and_more', 'comorbidities_2_and_more', 'comorbidities_3_and_more']

I am doing below operation

base = df.groupby(['fsa'])[list(df.columns)].mean().reset_index()

print(list(base.columns)) # 8 Columns

['fsa', 'index', 'hemorrhoids_usual_brands', 'constipation_usual_brands', 'household_use_vitamins', 'comorbidities_1_and_more', 'comorbidities_2_and_more', 'comorbidities_3_and_more']

df:

          nameid postal_code responder_18_20_years responder_21_24_years responder_25_34_years  ... household_use_vitamins comorbidities_count comorbidities_1_and_more comorbidities_2_and_more comorbidities_3_and_more
0      395456804      T5A5J2                     0                     0                     0  ...                      1                   4                        1                        1                        1
1      395457268      B0N2T0                     0                     0                     0  ...                      1                   1                        1                        0                        0
2      395457996      A0P1E0                     0                     0                     0  ...                      1                   0                        0                        0                        0
3      395459756      R2L1K4                     0                     0                     1  ...                      1                   1                        1                        0                        0
4      395460066      K7S0G5                     0                     0                     0  ...                      0                   3                        1                        1                        1
...          ...         ...                   ...                   ...                   ...  ...                    ...                 ...                      ...                      ...                      ...
30495  415165256      J1K3A1                     0                     0                     0  ...                      1                   1                        1                        0                        0
30496  415167960      J4Z0A8                     0                     0                     0  ...                      0                   0                        0                        0                        0
30497  415172676      J4W2V7                     0                     0                     0  ...                      1                   0                        0                        0                        0
30498  415173494      J9J1N4                     0                     0                     1  ...                      0                   2                        1                        1                        0
30499  415178060      J2N1S6                     0                     0                     0  ...                      0                   0                        0                        0                        0

[30500 rows x 170 columns]

base:

      fsa         index  hemorrhoids_usual_brands  constipation_usual_brands  household_use_vitamins  comorbidities_1_and_more  comorbidities_2_and_more  comorbidities_3_and_more
0     A0A  14210.485714                  0.185714                   0.128571                0.642857                  0.742857                  0.485714                  0.214286
1     A0B  17482.240000                  0.160000                   0.160000                0.520000                  0.680000                  0.400000                  0.240000
2     A0C  13111.312500                  0.375000                   0.312500                0.562500                  0.750000                  0.562500                  0.312500
3     A0E  12802.424242                  0.212121                   0.090909                0.727273                  0.757576                  0.545455                  0.303030
4     A0G  12073.046512                  0.255814                   0.116279                0.604651                  0.744186                  0.488372                  0.302326
...   ...           ...                       ...                        ...                     ...                       ...                       ...                       ...
1578  V9Z   9706.947368                  0.210526                   0.105263                0.736842                  0.789474                  0.473684                  0.368421
1579  X0A  10696.500000                  0.000000                   0.500000                0.500000                  1.000000                  0.000000                  0.000000
1580  X0E   6063.333333                  0.166667                   0.000000                0.666667                  0.500000                  0.166667                  0.000000
1581  X1A  15681.444444                  0.111111                   0.000000                0.777778                  0.444444                  0.444444                  0.111111
1582  Y1A   9006.333333                  0.222222                   0.111111                0.666667                  0.777778                  0.555556                  0.222222

[1583 rows x 8 columns]

How can achieve all the columns?

Shivika
  • 209
  • 3
  • 15
  • Why are you using ```[list(df.columns)]```? If you want all the columns just use the ```.mean()``` directly after the ```.groupby()``` function – Robert Jul 06 '22 at 14:33
  • @Robert still giving 8 columns only – Shivika Jul 06 '22 at 14:43
  • Can you provide a minimal data working example? This way we can help you better. E.g. the dataset with your columns and 5 rows. – Robert Jul 06 '22 at 14:59
  • I cannot see all the columns, but pandas only take the mean of the columns that contains **ALL** floats or integers. I think this is what causing your problems? – Robert Jul 06 '22 at 15:09

0 Answers0