0

I have a dataframe containing name of companies and people. For instance, in the same row, in the first column I have the name of the company and in its second column I have the people's names that are part of the company (each line a name). And the number of people varies for each company and I have more than 100 companies.

This is a sample of the dataset:

Company name                DM Full name

LA CAMPAGNOLA SA            Mr Victor Daniel Martin
                            Mr Fernando Luis Falco
                            Mr Gustavo Pablo Macchi
                            Mr Marcelo Dario Siano
INVERSORA ELECTRICA S.A.    Mr Luis Pablo Rogelio Pagano
                            Mr Damian Eduardo Sanfilippo
                            Mr Sebastian Cordova Moyano
                            Ms Sylvina Gabriela Sanchez
                            Mr Luis Rodolfo Secco
                            Mr Jaime Javier Barba

What I need is to create a new dataframe where each row is an independent item that includes the name of the company and the name of the employee as follows:

Company name                DM Full name

LA CAMPAGNOLA SA            Victor Daniel Martin
LA CAMPAGNOLA SA            Fernando Luis Falco
LA CAMPAGNOLA SA            Gustavo Pablo Macchi
LA CAMPAGNOLA SA            Marcelo Dario Siano
INVERSORA ELECTRICA S.A.    Luis Pablo Rogelio Pagano
INVERSORA ELECTRICA S.A.    Damian Eduardo Sanfilippo
INVERSORA ELECTRICA S.A.    Sebastian Cordova Moyano
INVERSORA ELECTRICA S.A.    Sylvina Gabriela Sanchez
INVERSORA ELECTRICA S.A.    Luis Rodolfo Secco
INVERSORA ELECTRICA S.A.    Jaime Javier Barba

I've found different examples as ways to do it, but none of them work. For instance: Split cell into multiple rows in pandas dataframe

Can somebody give me a hand on how to do it?

PAstudilloE
  • 659
  • 13
  • 24
  • `df['Company name'].replace('', np.nan).ffill()` – user3483203 Sep 24 '18 at 19:42
  • 3
    Try `df = df.reset_index()`. – Scott Boston Sep 24 '18 at 19:42
  • could you please attach a code or a smaller example which one could reproduce? – PEBKAC Sep 24 '18 at 20:01
  • @PEBKAC... this is copied directly from the CSV file: Company name,"DM Full name" LA CAMPAGNOLA SA,"Mr Fernando Luis Falco Mr Gustavo Pablo Macchi Mr Marcelo Dario Siano" INVERSORA ELECTRICA DE BUENOS AIRES S.A.,"Mr Luis Pablo Rogelio Pagano Mr Damian Eduardo Sanfilippo Mr Sebastian Cordova Moyano Ms Sylvina Gabriela Sanchez Mr Luis Rodolfo Secco Mr Jaime Javier Barba" – PAstudilloE Sep 25 '18 at 00:09
  • could I ask you how did you save this csv? I copy-pasted it and saved it in a blank file. When I open it with `pd.read_csv` it looks like a combination of comma and space delimitation ...and it gives me an empty output. could you tell me the command of how you opened it? – PEBKAC Sep 25 '18 at 09:11

2 Answers2

0

Looks like the company name is in your index, to get this out as a column you can try running:

df = df.reset_index()

If this isn't the case try creating a small reproducible example

Edit: From your comment I'm not 100% clear but I think you have blank strings in your company name column for each of your rows which are just a person?

In which case first replace the blank strings with missings, and then forward fill.

df = df.replace(r'^\s+$', np.nan, regex=True)
df["Company name"] = df["Company name"].fillna(method="ffill")
Sven Harris
  • 2,884
  • 1
  • 10
  • 20
  • this is an example @Sven. This is how the .csv file actually looks: Company name,"DM Full name" LA CAMPAGNOLA SA,"Mr Fernando Luis Falco Mr Gustavo Pablo Macchi Mr Marcelo Dario Siano" INVERSORA ELECTRICA DE BUENOS AIRES S.A.,"Mr Luis Pablo Rogelio Pagano Mr Damian Eduardo Sanfilippo Mr Sebastian Cordova Moyano Ms Sylvina Gabriela Sanchez Mr Luis Rodolfo Secco Mr Jaime Javier Barba" – PAstudilloE Sep 25 '18 at 00:08
0

I managed to solve the issue:

colNames = ('Company name','DM Full name')

# Define a dataframe with the required column names
masterDF = pd.DataFrame(columns = colNames)

for i in range (0, len(df)):
    names=df[df.columns[1]][i]
    names=names.splitlines()
    for name in names:
        count+=1
        if name.replace(' ','').isalpha():
            name=name.replace('Mr ','').replace('Ms ','')
            company=df[df.columns[0]][i]
            company=company.replace('\n',' ')
        else:
            pass

        a = str(company)
        b = str(name)

        masterDF = masterDF.append({'Company name': a, 'DM Full name': b}, ignore_index=True)

print masterDF
PAstudilloE
  • 659
  • 13
  • 24