A column in a dataframe has values, 'abc,def,ghi'
. I want to make an array like this:['abc','def','ghi']
Asked
Active
Viewed 5,710 times
-2
-
This gives much more detailed answer http://stackoverflow.com/questions/40784200/pandas-convert-column-to-list – bigbounty May 08 '17 at 17:28
2 Answers
7
Use str.split
:
df['col'] = df['col'].str.split(',')
Sample:
df = pd.DataFrame({'col':['abc,def,ghi','abc,def,ghi']})
df['col'] = df['col'].str.split(',')
print (df)
col
0 [abc, def, ghi]
1 [abc, def, ghi]
print (df.loc[0, 'col'])
['abc', 'def', 'ghi']
print (type(df.loc[0, 'col']))
<class 'list'>
If never NaN
values use list comprehension:
df['col'] = [x.split(',') for x in df['col'].values.tolist()]
print (df)
col
0 [abc, def, ghi]
1 [abc, def, ghi]

jezrael
- 822,522
- 95
- 1,334
- 1,252
-
-
-
-
@Aravindh don't forget to accept this answer and up vote if you found it helpful. – piRSquared May 08 '17 at 17:56
1
Consider the dataframe df
with random number of strings separated by commas.
np.random.seed([3,1415])
k = 10
df = pd.DataFrame(
np.random.choice(list('ABCD,'), (k, 20))
).sum(1).str.strip(',').str.replace(',+', ',').to_frame('col1')
df
col1
0 ADCDCCDCDACAA,ACCA,B
1 DC,DDD,DBDA,CCAC
2 A,B,CCAC,DB,C,CD,D
3 ADDBAA,DA,BD,C,AACA
4 DADBB,D,DBD,ADCAADB
5 CBCBA,CA,B,AA,CDCBDB
6 BD,D,DDB,AC,B,C,ABBA
7 C,CABBBADCD,DBCC,ACD
8 CC,A,BCAAAACBBA,BD
9 AC,A,ADBBD,BDCCDDABD
I like to use numpy
s functionality for splitting
df.assign(col1=np.core.defchararray.split(df.col1.values.astype(str), ','))
col1
0 [ADCDCCDCDACAA, ACCA, B]
1 [DC, DDD, DBDA, CCAC]
2 [A, B, CCAC, DB, C, CD, D]
3 [ADDBAA, DA, BD, C, AACA]
4 [DADBB, D, DBD, ADCAADB]
5 [CBCBA, CA, B, AA, CDCBDB]
6 [BD, D, DDB, AC, B, C, ABBA]
7 [C, CABBBADCD, DBCC, ACD]
8 [CC, A, BCAAAACBBA, BD]
9 [AC, A, ADBBD, BDCCDDABD]
Fast for small data
%timeit df.assign(col1=np.core.defchararray.split(df.col1.values.astype(str), ','))
1000 loops, best of 3: 204 µs per loop
%timeit df.assign(col1=df['col1'].str.split(','))
1000 loops, best of 3: 327 µs per loop
%timeit df.assign(col1=[x.split(',') for x in df['col1'].values.tolist()])
1000 loops, best of 3: 210 µs per loop
Not as fast for large dataa
np.random.seed([3,1415])
k = 10000
df = pd.DataFrame(
np.random.choice(list('ABCD,'), (k, 100))
).sum(1).str.strip(',').str.replace(',+', ',').to_frame('col1')
%timeit df.assign(col1=np.core.defchararray.split(df.col1.values.astype(str), ','))
10 loops, best of 3: 19.6 ms per loop
%timeit df.assign(col1=df['col1'].str.split(','))
100 loops, best of 3: 13.5 ms per loop
%timeit df.assign(col1=[x.split(',') for x in df['col1'].values.tolist()])
100 loops, best of 3: 11.5 ms per loop

piRSquared
- 285,575
- 57
- 475
- 624