0

I am doing a network analysis via networks and noticed that some of the nodes are being treated differently just because they have extra spaces (leading).

I tried to remove the spaces using the following codes but I cannot seem to make the output become strings again.

rhedge = pd.read_csv(r"final.edge.csv")
rhedge

_________________
 source | to
 niala  | Sana, Sana
 Wacko  | Ana, Aisa

rhedge['to'][1]
'Sana, Sana'

rhedge['splitted_users2'] = rhedge['to'].apply(lambda x:x.split(','))

#I need to split them so they will be included as different nodes

The problem is with the next code

rhedge['splitted_users2'][1]
['Sana', ' Sana']

As you can see the second Sana has a leading space.

I tried to do this:

split_users = []

for i in split:
    row = [x.strip() for x in i]
    split_users.append(row)

pd.Series(split_users)

But when I am trying to split them by "," again, it won't allow me because the dataset is now list. I believe that splitting them would make networks treat them as one node as opposed to creating a different node for one with a leading space.

THANK YOU

Mtrinidad
  • 157
  • 1
  • 11

1 Answers1

0

Changing the lambda expression

import pandas pd

# dataframe creation
df = pd.DataFrame({'source': ['niala', 'Wacko'], 'to': ['Sana, Sana', 'Ana, Aisa']})

# split and strip with a list comprehension
df['splitted_users2'] = df['to'].apply(lambda x:[y.strip() for y in x.split(',')])

print(df['splitted_users2'][0])

>>> ['Sana', 'Sana']

Alternatively

Option 1

  • Split on ', ' instead of ','
df['to'] = df['to'].str.split(', ')

Option 2

  • Replace ' ' with '' and then split on ','
  • This has the benefit of removing any whitespace around either name (e.g. [' Sana, Sana', ' Ana, Aisa'])
df['to'] = df['to'].str.replace(' ', '').str.split(',')
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158