I try doing some test and choose the best method.
The fastest one is creating list from column _id
and split by native python split('~')
:
df[['one', 'two', 'three']] = pd.DataFrame([ x.split('~') for x in df['_id'].tolist() ])
import pandas as pd
#test list
x =['82283344~Electronics~Mobile Cases & Covers', '82283346~Electronics~Mobile Cases & Covers', '82283343~Electronics~Mobile Cases & Covers']
#100000 lists
x = x * 100000
#create new df with column _id
df = pd.DataFrame({'_id': x })
print df.head()
_id
0 82283344~Electronics~Mobile Cases & Covers
1 82283346~Electronics~Mobile Cases & Covers
2 82283343~Electronics~Mobile Cases & Covers
3 82283344~Electronics~Mobile Cases & Covers
4 82283346~Electronics~Mobile Cases & Covers
def DF(df):
df[['one', 'two', 'three']] = pd.DataFrame([ x.split('~') for x in df['_id'].tolist() ])
def AP(df):
df['one'] = df._id.apply(lambda x: x.split('~')[0])
df['two'] = df._id.apply(lambda x: x.split('~')[1])
df['three'] = df._id.apply(lambda x: x.split('~')[2])
def EX(df):
df[['one', 'two', 'three']] = df._id.str.split('~', expand=True)
def SP(df):
df['one'] = df['_id'].str.split('~').str[0]
df['two'] = df['_id'].str.split('~').str[1]
df['three'] = df['_id'].str.split('~').str[2]
DF(df)
print df.head()
AP(df)
print df.head()
EX(df)
print df.head()
SP(df)
print df.head()
4 times is repeating:
_id one two \
0 82283344~Electronics~Mobile Cases & Covers 82283344 Electronics
1 82283346~Electronics~Mobile Cases & Covers 82283346 Electronics
2 82283343~Electronics~Mobile Cases & Covers 82283343 Electronics
3 82283344~Electronics~Mobile Cases & Covers 82283344 Electronics
4 82283346~Electronics~Mobile Cases & Covers 82283346 Electronics
three
0 Mobile Cases & Covers
1 Mobile Cases & Covers
2 Mobile Cases & Covers
3 Mobile Cases & Covers
4 Mobile Cases & Covers
Timing:
In [125]: %timeit DF(df)
...: %timeit AP(df)
...: %timeit EX(df)
...: %timeit SP(df)
...:
1 loops, best of 3: 332 ms per loop
1 loops, best of 3: 564 ms per loop
1 loops, best of 3: 668 ms per loop
1 loops, best of 3: 1.09 s per loop