I have dataframe which is having 170 columns but while doing groupby it is generating 8 columns only. I am not sure how it is happening. Could you please help me how to solve this?
Example:
print(list(df.columns))
['nameid', 'postal_code', 'responder_18_20_years', 'responder_21_24_years', 'responder_25_34_years', 'responder_35_44_years', 'responder_45_49_years', 'responder_50_54_years', 'responder_55_59_years', 'responder_60_64_years', 'responder_65_69_years', 'responder_70_74_years', 'responder_75_order', 'age_prefer_not_to_answer', 'main_responder_female', 'main_responder_male', 'gender_prefer_not_to_answer', 'marital_status_single', 'marital_status_married', 'household_number_people_1', 'household_number_people_2', 'household_number_people_3', 'household_number_people_4_more', 'household_infant_0_2_months_female', 'household_infant_0_2_months_male', 'household_infant_3_4_months_female', 'household_infant_3_4_months_male', 'household_infant_5_6_months_female', 'household_infant_5_6_months_male', 'household_infant_7_12_months_female', 'household_infant_7_12_months_male', 'household_infant_13_23_months_female', 'household_infant_13_23_months_male', 'household_children_24_35_months_female', 'household_children_24_35_months_male', 'household_children_3_4_years_female', 'household_children_3_4_years_male', 'household_children_5_8_years_female', 'household_children_5_8_years_male', 'household_children_9_12_years_female', 'household_children_9_12_years_male', 'household_children_13_15_years_female', 'household_children_13_15_years_male', 'household_children_16_17_years_female', 'household_children_16_17_years_male', 'household_adult_18_20_years_female', 'household_adult_18_20_years_male', 'household_adult_21_24_years_female', 'household_adult_21_24_years_male', 'household_adult_25_34_years_female', 'household_adult_25_34_years_male', 'household_adult_35_44_years_female', 'household_adult_35_44_years_male', 'household_adult_45_49_years_female', 'household_adult_45_49_years_male', 'household_adult_50_54_years_female', 'household_adult_50_54_years_male', 'household_adult_55_59_years_female', 'household_adult_55_59_years_male', 'household_adult_60_64_years_female', 'household_adult_60_64_years_male', 'household_adult_65_69_years_female', 'household_adult_65_69_years_male', 'household_adult_70_74_years_female', 'household_adult_70_74_years_male', 'household_adult_75_over_years_female', 'household_adult_75_over_years_male', 'household_income_less_20k', 'household_income_20k_40k', 'household_income_40k_60k', 'household_income_60k_80k', 'household_income_80k_100k', 'household_income_100k_over', 'household_income_prefer_not_to_answer', 'plan_diet_to_lose_weight', 'plan_diet_to_more_natural_organic_food', 'plan_diet_to_more_plant_based_food', 'plan_diet_to_eat_gluten_free', 'supplements_reason_increase_fibre', 'supplements_reason_relieve_mucle_or_join_pain', 'food_shopping_habits_regularly_buy_new_food', 'food_shopping_habits_pay_more_for_better_quality', 'food_shopping_habits_food_convenient_to_use', 'household_use_adult_pain_relievers_yes', 'household_use_adult_pain_relievers_no', 'hemorrhoids_usual_brand_anusol', 'hemorrhoids_usual_brand_preparation_h', 'constipation_usual_brand_dulcolax', 'constipation_usual_brand_ex_lax', 'constipation_usual_brand_metamucil', 'constipation_usual_brand_restoralax', 'constipation_usual_brand_senokot', 'household_use_multivitamins_yes', 'household_use_specific_vitamins_yes', 'household_use_vitamins_no', 'rx_med_brand_name_importance_very_important', 'rx_med_brand_name_importance_somewhat_important', 'rx_med_brand_name_importance_not_important', 'rx_med_do_not_use', 'rx_med_pay_assist_pgm_innoviscares_savings_current_use', 'rx_med_pay_assist_pgm_innoviscares_savings_previously_use', 'rx_med_pay_assist_pgm_merck_savings_current_use', 'rx_med_pay_assist_pgm_merck_savings_previously_use', 'rx_med_pay_assist_pgm_pfizer_savings_current_use', 'rx_med_pay_assist_pgm_pfizer_savings_previously_use', 'rx_med_pay_assist_pgm_rxhelp_one_savings_current_use', 'rx_med_pay_assist_pgm_rxhelp_one_savings_previously_use', 'rx_med_pay_assist_pgm_other_savings_current_use', 'rx_med_pay_assist_pgm_other_savings_previously_use', 'rx_med_pay_assist_pgm_none', 'ailment_arthritis', 'ailment_chronic_pain', 'ailment_depression', 'ailment_dry_itchy_skin', 'ailment_heartburn', 'ailment_high_blood_pressure', 'ailment_high_cholesterol', 'ailment_muscle_aches_body_pain', 'ailment_osteoarthritis', 'ailment_poor_leg_circulation', 'ailment_rheumatoid_arthritis', 'household_drug_plan_private_company', 'household_drug_plan_government', 'household_drug_plan_does_not_apply', 'household_psoriasis_yes', 'household_psoriasis_no', 'travel_freq_4_more_trips_per_years_business', 'travel_freq_4_more_trips_per_year_personal', 'travel_freq_1_3_trips_per_year_business', 'travel_freq_1_3_trips_per_year_personal', 'travel_do_not_travel_business', 'travel_do_not_travel_personal', 'hobbies_interests_camping_hiking', 'hobbies_interests_casino_gambling', 'hobbies_interests_cigar_smoking', 'hobbies_interests_coin_collecting', 'hobbies_interests_collectables', 'hobbies_interests_education_career_advancement', 'hobbies_interests_environment_friendly', 'hobbies_interests_exercise_1_2_times_per_week', 'hobbies_interests_exercise_3_more_per_week', 'hobbies_interests_regular_buy_latest_fashion_trends', 'hobbies_interests_self_improvement_courses', 'hobbies_interests_sweepstakes_lotteris', 'hobbies_interests_theater_performing_arts', 'hobbies_interests_yoga_pilates', 'hobbies_interests_none_above', 'loyalty_reward_aeroplan', 'loyalty_reward_air_miles', 'loyalty_reward_hbc', 'loyalty_reward_pc_optimum', 'loyalty_reward_petro_points', 'loyalty_reward_sobeys', 'loyalty_reward_other', 'loyalty_reward_does_not_apply', 'language_english', 'language_french', 'index', 'fsa', 'region', 'age', 'income', 'gender', 'hemorrhoids_usual_brands', 'constipation_usual_brands', 'household_use_vitamins', 'comorbidities_count', 'comorbidities_1_and_more', 'comorbidities_2_and_more', 'comorbidities_3_and_more']
I am doing below operation
base = df.groupby(['fsa'])[list(df.columns)].mean().reset_index()
print(list(base.columns)) # 8 Columns
['fsa', 'index', 'hemorrhoids_usual_brands', 'constipation_usual_brands', 'household_use_vitamins', 'comorbidities_1_and_more', 'comorbidities_2_and_more', 'comorbidities_3_and_more']
df:
nameid postal_code responder_18_20_years responder_21_24_years responder_25_34_years ... household_use_vitamins comorbidities_count comorbidities_1_and_more comorbidities_2_and_more comorbidities_3_and_more
0 395456804 T5A5J2 0 0 0 ... 1 4 1 1 1
1 395457268 B0N2T0 0 0 0 ... 1 1 1 0 0
2 395457996 A0P1E0 0 0 0 ... 1 0 0 0 0
3 395459756 R2L1K4 0 0 1 ... 1 1 1 0 0
4 395460066 K7S0G5 0 0 0 ... 0 3 1 1 1
... ... ... ... ... ... ... ... ... ... ... ...
30495 415165256 J1K3A1 0 0 0 ... 1 1 1 0 0
30496 415167960 J4Z0A8 0 0 0 ... 0 0 0 0 0
30497 415172676 J4W2V7 0 0 0 ... 1 0 0 0 0
30498 415173494 J9J1N4 0 0 1 ... 0 2 1 1 0
30499 415178060 J2N1S6 0 0 0 ... 0 0 0 0 0
[30500 rows x 170 columns]
base:
fsa index hemorrhoids_usual_brands constipation_usual_brands household_use_vitamins comorbidities_1_and_more comorbidities_2_and_more comorbidities_3_and_more
0 A0A 14210.485714 0.185714 0.128571 0.642857 0.742857 0.485714 0.214286
1 A0B 17482.240000 0.160000 0.160000 0.520000 0.680000 0.400000 0.240000
2 A0C 13111.312500 0.375000 0.312500 0.562500 0.750000 0.562500 0.312500
3 A0E 12802.424242 0.212121 0.090909 0.727273 0.757576 0.545455 0.303030
4 A0G 12073.046512 0.255814 0.116279 0.604651 0.744186 0.488372 0.302326
... ... ... ... ... ... ... ... ...
1578 V9Z 9706.947368 0.210526 0.105263 0.736842 0.789474 0.473684 0.368421
1579 X0A 10696.500000 0.000000 0.500000 0.500000 1.000000 0.000000 0.000000
1580 X0E 6063.333333 0.166667 0.000000 0.666667 0.500000 0.166667 0.000000
1581 X1A 15681.444444 0.111111 0.000000 0.777778 0.444444 0.444444 0.111111
1582 Y1A 9006.333333 0.222222 0.111111 0.666667 0.777778 0.555556 0.222222
[1583 rows x 8 columns]
How can achieve all the columns?