3

I have a dataframe need to split the column if _ comes

Name = [('Hello'),
      ('Spider'),
      ('Captain'),
      ('Superman'),
       ('Hello_1'),
       ('Superman_1')]
dfName = pd.DataFrame(Name, columns=['Name'])

My Out

    Name
0   Hello
1   Spider
2   Captain
3   Superman
4   Hello_1
5   Superman_1

Expected Out

df1

    Name      
0   Hello
1   Spider
2   Captain
3   Superman

df2

    Name_
0   Hello_1
1   Superman_1
  • Possible duplicate of [Pandas split DataFrame by column value](https://stackoverflow.com/questions/33742588/pandas-split-dataframe-by-column-value) – Georgy Oct 24 '19 at 14:33

4 Answers4

1

Use Series.str.contains for mask and filter by inverting mask for not contains with ~ for df1 and without for df2 by boolean indexing, last add DataFrame.reset_index for default RangeIndex:

m = dfName['Name'].str.contains('_')

#is sample data .reset_index(drop=True) not necessary, added for general solution
df1 = dfName[~m].reset_index(drop=True)
print(df1)
       Name
0     Hello
1    Spider
2   Captain
3  Superman

df2 = dfName[m].reset_index(drop=True)
print(df2)
         Name
0     Hello_1
1  Superman_1   
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

you may need to split your first list into two sub lists first:

>>> name = 'Hello Spider Captain Superman Hello_1 Superman_1'.split()
>>> name
['Hello', 'Spider', 'Captain', 'Superman', 'Hello_1', 'Superman_1']
>>> col1 = [n for n in name if '_' not in n]
>>> col2 = [n for n in name if '_' in n]
>>> col1
['Hello', 'Spider', 'Captain', 'Superman']
>>> col2
['Hello_1', 'Superman_1']
>>> 

Note: per convention variables should be lower case, to distinguish them from classes. https://www.python.org/dev/peps/pep-0008/#function-and-variable-names

alfajet
  • 389
  • 1
  • 14
1

You can use this code to split your data frame:

df1 = dfName[~dfName["Name"].str.contains('_1', na=False)].reset_index(drop=True)
df2 = dfName[dfName["Name"].str.contains('_1', na=False)].reset_index(drop=True)

the output of df1:

Name
0   Hello
1   Spider
2   Captain
3   Superman

the output of df2:

    Name
0   Hello_1
1   Superman_1
Vahid Vaezinia
  • 171
  • 2
  • 11
0
dfnamewithout_regex = dfName[~dfName['Name'].str.contains('_')]
dfnamewithout_regex
    Name
0   Hello
1   Spider
2   Captain
3   Superman

dfnamewith_regex = dfName[dfName['Name'].str.contains('_')]
dfnamewith_regex
Name
4   Hello_1
5   Superman_1

If you want to drop the index, add .reset_index(drop=True)

sre
  • 249
  • 1
  • 12