string vector to list python

Question

I'm working in Python and I have a column in data frame that is a string and looks like that :

df['set'] 

0  [911,3040]
1  [130055, 99832, 62131]
2  [19397, 3987, 5330, 14781]
3  [76514, 70178, 70301, 76545]
4  [79185, 38367, 131155, 79433]

I would like it to be:

['911','3040'],['130055','99832','62131'],['19397','3987','5330','14781'],['76514',70178','70301','76545'],['79185','38367','131155','79433']

in order to be able to run Word2Vec:

model = gensim.models.Word2Vec(df['set'] , size=100)

Thanks !

Ah, it is a string? In that case, you'll need to convert it first. How many rows do you have? Less than 100? — cs95, Jan 22 '18 at 13:27
So the column is a list of ints and your want a list of strings? — James, Jan 22 '18 at 13:27

score 1 · Accepted Answer · answered Jan 22 '18 at 13:31

If you have a column of strings, I'd recommend looking here at different ways of parsing it.

Here's how I'd do it, using ast.literal_eval.

>>> import ast
>>> [list(map(str, x)) for x in df['set'].apply(ast.literal_eval)]

Or, using pd.eval -

>>> [list(map(str, x)) for x in df['set'].apply(pd.eval)]  # 100 rows or less

Or, using yaml.load -

>>> import yaml
>>> [list(map(str, x)) for x in df['set'].apply(yaml.load)]

[
     ['911', '3040'], 
     ['130055', '99832', '62131'], 
     ['19397', '3987', '5330', '14781'], 
     ['76514', '70178', '70301', '76545'],
     ['79185', '38367', '131155', '79433']
 ]

score 0 · Answer 2 · answered Jan 22 '18 at 13:29

I think you need:

model = gensim.models.Word2Vec([[str(y) for y in x] for x in df['set']] , size=100)

L = [[str(y) for y in x] for x in df['set']]
print (L)

[['911', '3040'],
 ['130055', '99832', '62131'], 
 ['19397', '3987', '5330', '14781'],
 ['76514', '70178', '70301', '76545'], 
 ['79185', '38367', '131155', '79433']]

score 0 · Answer 3 · answered Jan 22 '18 at 13:37

0

To create a new column (str_set) with the items in the set column converted to string:

df["str_set"] = [[str(item) for item in df.loc[row, "set"]] for row in range(len(df["set"]))]

answered Jan 22 '18 at 13:37

Toby Petty

4,431
1
17
29

score 0 · Answer 4 · answered Jan 22 '18 at 14:09

0

Convert every single element to a string with a simple list comprehension and overwrite your old column:

df['set']  = [[str(i) for i in row] for row in df['set']]

Executed on the data provided:

data_col = [911,3040], [130055, 99832, 62131], [19397, 3987, 5330, 14781], [76514, 70178, 70301, 76545],[79185, 38367, 131155, 79433]

out = [[str(i) for i in row] for row in data_col]

out

[['911', '3040'],
 ['130055', '99832', '62131'],
 ['19397', '3987', '5330', '14781'],
 ['76514', '70178', '70301', '76545'],
 ['79185', '38367', '131155', '79433']]

Not sure if this is the fastest way for a big data set as there are a lot of iterations.

answered Jan 22 '18 at 14:09

Steven

619
1
7
21

Sorry, but it is same like my answer :( – jezrael Jan 22 '18 at 14:09
Sorry, just refreshed the page and saw it :( Voted yours up. – Steven Jan 22 '18 at 14:12
Ya, I dont want to say you copy my answer, but the best is removed it, because same. – jezrael Jan 22 '18 at 14:18

string vector to list python

4 Answers4