-4

It's a dataframes that each has Id, sex, age and so I. I first seperate the age with id and sex.`

    import numpy as np
    import pandas as pd
    age_distinct = titanic_df[['Sex','Age']].dropna()
    print age_distinct

get the result like this:

       Sex   Age
0      male  22.0
1    female  38.0
2    female  26.0
3    female  35.0
4      male  35.0
6      male  54.0
7      male   2.0
8    female  27.0
9    female  14.0
10   female   4.0
11   female  58.0
12     male  20.0
13     male  39.0
14   female  14.0
15   female  55.0
16     male   2.0
18   female  31.0
20     male  35.0
21     male  34.0
22   female  15.0
23     male  28.0
24   female   8.0
25   female  38.0
27     male  19.0
30     male  40.0
33     male  66.0
34     male  28.0
35     male  42.0
37     male  21.0
38   female  18.0
..      ...   ...
856  female  45.0
857    male  51.0

But I don't know the next step. How can I get a two set of data only include male and female

Clark 123
  • 41
  • 1
  • 8

2 Answers2

2

What you're looking for is:

titanic_df[titanic_df['Sex'] == 'male']

This is basically a SELECT * FROM titanic_df WHERE Sex == 'male', if you're familiar with SQL.

Edit: If you want to create two different pandas.DataFrame objects from each level of Sex, you can store each DataFrame in a dictionary, like this:

distinct_dfs = {}
for level in set(titanic_df['Sex']):
     level_df = titanic_df[titanic_df['Sex'] == level]
     distinct_dfs[level] = level_df

That's just one approach you could take, and would be advantageous with many different values for Sex. But, since you only have two values, this would be easiest:

female_df = titanic_df[titanic_df['Sex'] == 'female']
male_df = titanic_df[titanic_df['Sex'] == 'male']
blacksite
  • 12,086
  • 10
  • 64
  • 109
0

I think you need boolean indexing or query:

print age_distinct[age_distinct.Sex == 'male']
print age_distinct.query('Sex == "male"') 

Sample:

titanic_df = pd.DataFrame({'Sex':['male','female',np.nan],
                             'Age':[40,50,60]})

print (titanic_df)
   Age     Sex
0   40    male
1   50  female
2   60     NaN

age_distinct = titanic_df[['Sex','Age']].dropna()

print (age_distinct[age_distinct.Sex == 'male'])
    Sex  Age
0  male   40

print (age_distinct.query('Sex == "male"') )
    Sex  Age
0  male   40
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Sure, btw, never use pictures instead of text, it cannot be copied and you get a lot downvotes. Better is use small sample - check [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – jezrael Jan 07 '17 at 17:48
  • It's my first time to post a question here, so I am sorry to use the wrong formation! Thanks for reminding! I will be careful next time. – Clark 123 Jan 07 '17 at 18:18