Find out the most popular male/famale name from dataframe

Question

Decision which came to my mind is:

dataset['Name'].loc[dataset['Sex'] == 'female'].value_counts().idxmax()

But here is not such ordinary decision because there are names of female's husband after Mrs and i need to somehowes split it

Input data:

df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss. Laina', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'Allen, Mr. William Henry', 'Moran, Mr. James', 'McCarthy, Mr. Timothy J', 'Palsson, Master. Gosta Leonard', 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)', 'Nasser, Mrs. Nicholas (Adele Achem)'],
                   'Sex': ['male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female'],
                   })



Task 4: Name the most popular female name on the ship.
'some code'
Output: Anna      #The most popular female name
Task 5: Name the most popular male name on the ship.
'some code'
Output: Wilhelm   #The most popular male name

you need to provide a sample of your dataset as text, images of data/code are not allowed — mozway, Mar 31 '22 at 07:51
Your question is somewhat vague. Could you please give the output of your current code and what you expect? — KarelZe, Mar 31 '22 at 07:53
Output is: Cumings, Mrs. John Bradley (Florence Briggs Thayer); I need just female name — Edward, Mar 31 '22 at 08:06
can you [edit](https://stackoverflow.com/posts/71688793/edit) your question to provide the output of `df[['Name', 'Sex']].head(10).to_dict('list')`? — mozway, Mar 31 '22 at 08:33
@Edward and what should be the output names? first? last? 'Braund' and 'Thayer'? You should elaborate on the logic — mozway, Mar 31 '22 at 08:45
I need just one first name of the most popular name of female and one first name of the most popular name of male — Edward, Mar 31 '22 at 08:51
First word in round brackets for example in Futrelle, Mrs. Jacques Heath (Lily May Peel) name will be 'Lily' — Edward, Mar 31 '22 at 09:01
@Edward hopefully I understood, check [my answer](https://stackoverflow.com/a/71690190/16343464) — mozway, Mar 31 '22 at 09:35
yes thank's that's what i needed but how can i seperate it in two parts of code for male and female — Edward, Mar 31 '22 at 09:41
You don't need to separate, you can keep use `groupby`. Please provide a better example with non unique names **and the explicit expected output** if you need help with that part. — mozway, Mar 31 '22 at 09:48
I just have 2 tasks to find the most popular female and male name and i need to insert code in every task — Edward, Mar 31 '22 at 09:58
@Edward I fully understand the goal, no need to explain. What you need to provide the the **explicit** input/output. Please read [how to ask?](https://stackoverflow.com/help/how-to-ask) and [how to make good pandas examples?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — mozway, Mar 31 '22 at 10:00
Output can be the same as in your code but i need to first get female most popular names without male names and then another code to display male most popular names without female names — Edward, Mar 31 '22 at 10:08
I am sorry but unless you [edit](https://stackoverflow.com/posts/71688793/edit) your question to provide a clear input/output, I cannot help you — mozway, Mar 31 '22 at 10:09
@Edward thanks for your edit. A few remarks, please do not post duplicate questions. Then, ensure your input matches the expected output. Here, in the provided example, **Anna** and **Wilhelm** are **NOT** the most used names. Have you tested my code? It should answer the question. — mozway, Mar 31 '22 at 11:21

Denver Dang · Answer 1 · 2022-03-31T08:09:53.777

Quick and dirty would be something like:

from collections import Counter

# Random list of names
your_lst = ["Mrs Braun", "Allen, Mr. Timothy J", "Allen, Mr. Henry William"]

# Split names by space, and flatten the list.      
your_lst_flat = [item for sublist in [x.split(" ") for x in your_lst ] for item in sublist]

# Count occurrences. With this you will get a count of all the values, including Mr and Mrs. But you can just ignore these.
Counter(your_lst_flat).most_common()

mozway · Answer 2 · 2022-03-31T09:41:02.680

IIUC, you can use a regex to extract either the first name, or if Mrs. the name after the parentheses:

s = df['Name'].str.extract(r'((?:(?<=Mr. )|(?<=Miss. )|(?<=Master. ))\w+|(?<=\()\w+)',
                           expand=False)
s.groupby(df['Sex']).value_counts()

output:

Sex     Name     
female  Adele        1
        Elisabeth    1
        Florence     1
        Laina        1
        Lily         1
male    Gosta        1
        James        1
        Owen         1
        Timothy      1
        William      1
Name: Name, dtype: int64

regex demo

once you have s, to get the most frequent female name(s):

s[df['Sex'].eq('female')].mode()

Find out the most popular male/famale name from dataframe

2 Answers2