1

DaraFrame

Decision which came to my mind is:

dataset['Name'].loc[dataset['Sex'] == 'female'].value_counts().idxmax()

But here is not such ordinary decision because there are names of female's husband after Mrs and i need to somehowes split it

Input data:

df = pd.DataFrame({'Name': ['Braund, Mr. Owen Harris', 'Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss. Laina', 'Futrelle, Mrs. Jacques Heath (Lily May Peel)', 'Allen, Mr. William Henry', 'Moran, Mr. James', 'McCarthy, Mr. Timothy J', 'Palsson, Master. Gosta Leonard', 'Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)', 'Nasser, Mrs. Nicholas (Adele Achem)'],
                   'Sex': ['male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female'],
                   })



Task 4: Name the most popular female name on the ship.
'some code'
Output: Anna      #The most popular female name
Task 5: Name the most popular male name on the ship.
'some code'
Output: Wilhelm   #The most popular male name
Edward
  • 9
  • 2
  • 1
    you need to provide a sample of your dataset as text, images of data/code are not allowed – mozway Mar 31 '22 at 07:51
  • 1
    Your question is somewhat vague. Could you please give the output of your current code and what you expect? – KarelZe Mar 31 '22 at 07:53
  • Output is: Cumings, Mrs. John Bradley (Florence Briggs Thayer); I need just female name – Edward Mar 31 '22 at 08:06
  • can you [edit](https://stackoverflow.com/posts/71688793/edit) your question to provide the output of `df[['Name', 'Sex']].head(10).to_dict('list')`? – mozway Mar 31 '22 at 08:33
  • yes i have done it – Edward Mar 31 '22 at 08:41
  • @Edward and what should be the output names? first? last? 'Braund' and 'Thayer'? You should elaborate on the logic – mozway Mar 31 '22 at 08:45
  • I need just one first name of the most popular name of female and one first name of the most popular name of male – Edward Mar 31 '22 at 08:51
  • Yes but **explicitly** how do you define the **name** – mozway Mar 31 '22 at 08:51
  • First word in round brackets for example in Futrelle, Mrs. Jacques Heath (Lily May Peel) name will be 'Lily' – Edward Mar 31 '22 at 09:01
  • @Edward hopefully I understood, check [my answer](https://stackoverflow.com/a/71690190/16343464) – mozway Mar 31 '22 at 09:35
  • yes thank's that's what i needed but how can i seperate it in two parts of code for male and female – Edward Mar 31 '22 at 09:41
  • You don't need to separate, you can keep use `groupby`. Please provide a better example with non unique names **and the explicit expected output** if you need help with that part. – mozway Mar 31 '22 at 09:48
  • I just have 2 tasks to find the most popular female and male name and i need to insert code in every task – Edward Mar 31 '22 at 09:58
  • @Edward I fully understand the goal, no need to explain. What you need to provide the the **explicit** input/output. Please read [how to ask?](https://stackoverflow.com/help/how-to-ask) and [how to make good pandas examples?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – mozway Mar 31 '22 at 10:00
  • Output can be the same as in your code but i need to first get female most popular names without male names and then another code to display male most popular names without female names – Edward Mar 31 '22 at 10:08
  • I am sorry but unless you [edit](https://stackoverflow.com/posts/71688793/edit) your question to provide a clear input/output, I cannot help you – mozway Mar 31 '22 at 10:09
  • I tried to edit to make it clear – Edward Mar 31 '22 at 10:19
  • @Edward thanks for your edit. A few remarks, please do not post duplicate questions. Then, ensure your input matches the expected output. Here, in the provided example, **Anna** and **Wilhelm** are **NOT** the most used names. Have you tested my code? It should answer the question. – mozway Mar 31 '22 at 11:21
  • rather `s.groupby(df['Sex']).value_counts()` – mozway Mar 31 '22 at 11:47

2 Answers2

0

Quick and dirty would be something like:

from collections import Counter

# Random list of names
your_lst = ["Mrs Braun", "Allen, Mr. Timothy J", "Allen, Mr. Henry William"]

# Split names by space, and flatten the list.      
your_lst_flat = [item for sublist in [x.split(" ") for x in your_lst ] for item in sublist]

# Count occurrences. With this you will get a count of all the values, including Mr and Mrs. But you can just ignore these.
Counter(your_lst_flat).most_common()
Denver Dang
  • 2,433
  • 3
  • 38
  • 68
0

IIUC, you can use a regex to extract either the first name, or if Mrs. the name after the parentheses:

s = df['Name'].str.extract(r'((?:(?<=Mr. )|(?<=Miss. )|(?<=Master. ))\w+|(?<=\()\w+)',
                           expand=False)
s.groupby(df['Sex']).value_counts()

output:

Sex     Name     
female  Adele        1
        Elisabeth    1
        Florence     1
        Laina        1
        Lily         1
male    Gosta        1
        James        1
        Owen         1
        Timothy      1
        William      1
Name: Name, dtype: int64

regex demo

once you have s, to get the most frequent female name(s):

s[df['Sex'].eq('female')].mode()
mozway
  • 194,879
  • 13
  • 39
  • 75