2

Having trouble with a particular str.split error

My dataframe contains a number followed by text:

(Names are made up

    print(df)
Date         Entry
20/2/2019  6 John Smith
20/2/2019  8 Matt Princess
21/2/2019  4 Nick Dromos
21/2/2019  4 Adam Force
21/2/2019  5 Gary
21/2/2019  4 El Chaparro
21/2/2019  7 Mike O Malley
21/2/2019  8 Jason
22/2/2019  7 Mitchell

I am simply trying to split the Entry column into two following the number.

Code i have tried:

df['number','name'] = df['Entry'].str.split('([0-9])',n=1,expand=True)

ValueError: Wrong number of items passed 3, placement implies 1

And then i tried on the space alone:

df['number','name'] = df['Entry'].str.split(" ",n=1,expand=True)

ValueError: Wrong number of items passed 2, placement implies 1

Ideally the df looks like:

  print(df)
    Date       number        name
    20/2/2019  6             John Smith
    20/2/2019  8             Matt Princess
    21/2/2019  4             Nick Dromos
    21/2/2019  4             Adam Force
    21/2/2019  5             Gary
    21/2/2019  4             El Chaparro
    21/2/2019  7             Mike O Malley
    21/2/2019  8             Jason
    22/2/2019  7             Mitchell

I feel like it may be something small but i cant seem to get it working. Any help would be great! Thanks very much

smci
  • 32,567
  • 20
  • 113
  • 146
SOK
  • 1,732
  • 2
  • 15
  • 33
  • The main issue was simply [How to add multiple columns to pandas dataframe in one assignment?](https://stackoverflow.com/questions/39050539/how-to-add-multiple-columns-to-pandas-dataframe-in-one-assignment). Your left-hand side `df['number','name'] = ...` was meaningless, it needed to be `df[['number','name']] = ...` – smci Apr 19 '20 at 07:46
  • A more descriptive title would be *"How to convert pandas string column into multiple new columns, using str.split and regex"*. The actual error message you got from `str.split` is not very informative. Also, don't say "Python dataframe" when you mean "pandas dataframe". – smci Apr 19 '20 at 07:57

1 Answers1

1

Add double [] and if want remove column from original also add DataFrame.pop, last remove first empty column by drop, [0-9]+ is change for get digits with length more like 1 like 10, 567...:

df[['number','name']] = df.pop('Entry').str.split('([0-9]+)',n=1,expand=True).drop(0, axis=1)
print (df)
        Date number            name
0  20/2/2019      6      John Smith
1  20/2/2019      8   Matt Princess
2  21/2/2019      4     Nick Dromos
3  21/2/2019      4      Adam Force
4  21/2/2019      5            Gary
5  21/2/2019      4     El Chaparro
6  21/2/2019      7   Mike O Malley
7  21/2/2019      8           Jason
8  22/2/2019      7        Mitchell

Solution with Series.str.extract:

df[['number','name']] = df.pop('Entry').str.extract('([0-9]+)(.*)')
#alternative
#df[['number','name']] = df.pop('Entry').str.extract('(\d+)(.*)')
print (df)
        Date number            name
0  20/2/2019      6      John Smith
1  20/2/2019      8   Matt Princess
2  21/2/2019      4     Nick Dromos
3  21/2/2019      4      Adam Force
4  21/2/2019      5            Gary
5  21/2/2019      4     El Chaparro
6  21/2/2019      7   Mike O Malley
7  21/2/2019      8           Jason
8  22/2/2019      7        Mitchell

pop function is for avoid remove column after select, so this code working same:

df[['number','name']] = df.pop('Entry').str.extract('(\d+)(.*)')

vs

df[['number','name']] = df['Entry'].str.extract('(\d+)(.*)')
df = df.drop('Entry', axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks @jezrael. I have been trying the pop since you helped me on my other question. At the moment i am getting a "re.error: unterminated character set at position 1" – SOK Apr 19 '20 at 07:20
  • @SOK - What solution failed? First or second? Btw, pop is only shourcut, give em a sec for alternative – jezrael Apr 19 '20 at 07:21
  • 1
    Sorry i didnt see the update. I ran `df[['number','name']] = df.pop('Entry').str.split('([0-9]+)',n=1,expand=True).drop(0, axis=1)` and it worked so thanks very much (again haha) – SOK Apr 19 '20 at 07:24