separating column for only one time by separator in Pandas

Question

I have data like

abc:def How I met her
ter:kpefe Hi I am this

I want the column to be like this

a       b         c
abc    def     How I met her
ter    kpefe   Hi I am this

I want to separate it into 3 column So I am using

data = pd.read_csv(input_file, sep=" |:", header = None, names=["a", "b", "c"])

which is giving many column beside a, b and c

Could you make your data clearer and maybe show the result that you want as well. thx — Phurich.P, Apr 06 '17 at 03:44

score 3 · Accepted Answer · edited May 23 '17 at 12:10

My stab at this uses the csv library to read the input into lists since it needs to be (heavily) sanitized before it can be neatly put into a DataFrame like you want.

# Python 3.5
import pandas as pd 
import csv 

col1 = []
col2 = []
col3 = []

with open('path/to/the/file.txt', newline='') as txt:
    reader = csv.reader(txt)
    for row in reader:
        # Get rid of brackets and ' on both ends of the string
        str_row = str(row)[1:-1].strip("'")
        # Get the first column's element
        split1 = str_row.split(':')
        col1.append(split1[0])
        # Get the second column's element
        split2 = split1[1].split(' ')
        col2.append(split2[0])
        # Join everything after the second column's element
        # to get the third column's element
        split3 = ' '.join([v for v in split2[1:]])
        col3.append(split3)

df = pd.DataFrame({'a':col1, 'b':col2, 'c':col3})
print(df)

Produces

         a      b              c
  0  abc    def  How I met her
  1  ter  kpefe   Hi I am this

Like I mentioned, I'm making the naive assumption that all of your data in structured in this way. Also if you don't want to manually put in the column names (for scalability) then you can use this nifty trick (which will automatically put integers as column names) to build the dataframe (referencing this SO thread):

# Gives the same desired output
df = pd.DataFrame(list(map(list, zip(col1, col2, col3))))

score 2 · Answer 2 · answered Apr 06 '17 at 05:06

setup

from io import StringIO
import pandas as pd

txt = """abc:def How I met her
ter:kpefe Hi I am this"""

s = pd.read_csv(StringIO(txt), sep='|', header=None, squeeze=True)

s

0     abc:def How I met her
1    ter:kpefe Hi I am this
Name: 0, dtype: object

solution
use str.extract

s.str.extract('(?P<a>\S+):(?P<b>\S+)\s+(?P<c>.*)', expand=True)

     a      b              c
0  abc    def  How I met her
1  ter  kpefe   Hi I am this

separating column for only one time by separator in Pandas

2 Answers2