0

I have read in the following dataset:

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup), xpath='.//individual')

In that dataset there are three columns called:

  • first_name
  • second_name
  • third_name

I need to concatenate those three columns so to get a new column (called name). Now, based on answers to questions which are similar to this, I have tried the following code:

df['name'] = [''.join(i) for i in zip(df["first_name"].map(str),df["second_name"].map(str), df['third_name'].map(str))]

However, the resulting dataset is not what I want.

So, this is what I get from the code above:

enter image description here

Basically:

  • there is no blank between the concatenated names
  • when one of the names is blank, "None" is concatenated.

What I'd like to get is this:

enter image description here

Can anyone help me please?

Giampaolo Levorato
  • 1,055
  • 1
  • 8
  • 22
  • 2
    Don't use code blindly, try to understand it. `''.join` concatenates the string without separator. You need to use`' '.join` to have spaces: `df['name'] = [' '.join(i) for i in zip(df["first_name"].map(str),df["second_name"].map(str), df['third_name'].map(str))]`. Or better: `df["first_name"].astype(str)+' '+df["second_name"].astype(str)+' '+df["third_name"].astype(str)`. Try to use `'-#-'.join` to understand what the method does. – mozway Jun 27 '23 at 08:29
  • 1
    Regarding the blanks, you might need to `fillna('')` first, then eventually `.str.strip()` your output. Or with your list comprehension: `df['name'] = [' '.join(x for x in i if x) for i in zip(df["first_name"], df["second_name"], df['third_name'])]` – mozway Jun 27 '23 at 08:39

0 Answers0