0

To start with - I'm a total beginner with Pandas, so descriptive help would be very appreciated.

I have one dataframe, called df_persons. This dataframe contains 2 columns, one "age" and one "gender". The ages spans from 0 - 100 yrs.

My main goal is to create a pie chart, showing amount of people who's in a certain age group.

What I wanted to do was to create a new dataframe, with 3 columns. Lets say I want to name this new dataframe test_df.

"Under 18" "Between 18 - 40" "Between 40-60" "60+"

In order to achieve this, I have tried the following:

test_df['Under 18'] = df[(person_df['Age'] >=18]

But without success.

I managed to get the columns in place by doing:

test_df['Under 18'] = df_person['Age']

But I have not been able to populate my 4 new columns, based on the dataframe I need to pull the information from.

test_df = pd.DataFrame(columns=['Under 18', 'Between 18 -40', 'Between 40-60', 'Over 60'])

test_df['Under 18'] =test_df['Under 18'].astype(str).astype(int)

test_df['Under 18'] = df_person[df_person['Age']>18]

What is the best approach in achieving this? Any help/tips / recommendations are very welcome.

sponkae
  • 51
  • 4
  • FYI in your first line you don't need `(`. Also this [`cut`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html) method should be able to help you. You could also look at this [example](https://stackoverflow.com/questions/45273731/binning-column-with-python-pandas). Now this is mainly to get one column. You could easily then [pivot](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot_table.html#pandas.DataFrame.pivot_table) that into different columns. Or `Transpose` – Buckeye14Guy Jul 18 '19 at 19:04
  • Please take a look at [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a [mcve] including a sample input and preferred output, as that can help us to help you better – G. Anderson Jul 18 '19 at 19:11

1 Answers1

0

You can always try conditional selection, you were really close in your first statement. So you have two dataframes: df_test & df_persons. We want to slice df_persons for your age groups and place them in df_test. To be sure you're not simply creating a variable that points to df_persons, you'll see that I add .copy() command at the end of each statement in order to create df_test as an object.

See if this works for your scenario:

df_test['Under 18'] = df_persons[df_persons['Age'] < 18].copy()

df_test['Between 18-40'] = df_persons[(df_persons['Age'] >= 18) & (df_persons['Age'] < 40)].copy()

df_test['Between 40-60'] = df_persons[(df_persons['Age'] >= 40) & (df_persons['Age'] < 60)].copy()

df_test['60+'] = df_persons[df_persons['Age'] >= 60].copy()