How to split a column based on several string indices using pandas

Question

I would like to split each row into new columns based on several indices:

6ABCDE0218594STRING

to

6 ABCDE 021 8594 STRING

This seems like it'd have been asked at least once before, but I keep finding only variations on the question (separating by a delimiter as in pandas: How do I split text in a column into multiple rows?, separating into new rows using rather than new columns, again with a delimiter: Split pandas dataframe string entry to separate rows).

I apologize in advance if this is a duplicate!

You can select parts of a string with simply use 'df.new=df.originallongstring.str()[1:5]' (untested) — Balzer82, Aug 11 '14 at 20:59

score 4 · Accepted Answer · answered Aug 11 '14 at 21:08

4

One way is to use a regex and str.extract to pull out the columns:

In [11]: df = pd.DataFrame([['6ABCDE0218594STRING']])

You could just do it with index, so something like this:

In [12]: df[0].str.extract('(.)(.{5})(.{3})(.{4})(.*)')
Out[12]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

Or you could be a bit more cautious and ensure each column is the correct form:

In [13]: df[0].str.extract('(\d)(.{5})(\d{3})(\d{4})(.*)')
Out[13]:
   0      1    2     3       4
0  6  ABCDE  021  8594  STRING

Note: You can also use named groups (see the docs).

answered Aug 11 '14 at 21:08

Andy Hayden

359,921
101
625
535

Thank you! I'm working on this now -- had a bit of a hiccup because I was using pandas 0.12. – iff_or Aug 11 '14 at 22:48
@Andy Hayden is there a way I can replicate this to a large set on indices on to a data frame? – user12229564 Mar 21 '20 at 15:22
Interestingly, this seems to convert strings consisting of only spaces into NaNs. Is there a way to avoid that, or should I just convert NaNs back to spaces? – AstroFloyd Sep 25 '21 at 16:03

score 0 · Answer 2 · answered Aug 11 '14 at 21:11

0

Try this:

string = '6ABCDE0218594STRING'
indices = [1,5,3,4]
myList = []

for index in indices:
    token, string = string[:index],string[index:]
    myList.append(token)

myList.append(string)

>>> Output: ['6', 'ABCDE', '021', '8594', 'STRING']

answered Aug 11 '14 at 21:11

javon27

316
2
10

score 0 · Answer 3 · answered Aug 11 '14 at 22:44

0

Or in case you don't know the number of digits, letters etc.:

import re

m = re.match('(\d*)([A-Z]*)(\d*)([A-Z]*)', '6ABCDE0218594STRING').groups()
print m[0], m[1], m[2], m[3]

Output:

6 ABCDE 0218594 STRING

answered Aug 11 '14 at 22:44

Falko

17,076
13
60
105

Yeah, regex is great! However, one issue with this particular application is that I will often need to break up a series of integers too (e.g. 0123456 -> 012 34 56). – iff_or Aug 11 '14 at 22:59
Oh, I overlooked that one number is split up. Then it is, of course, not working as required. :) – Falko Aug 11 '14 at 23:03

How to split a column based on several string indices using pandas

3 Answers3

Linked