Effective way to get array from pandas dataframe text column

Question

Can I do the following conversion to an array, using constructs like df.col.apply(lambda x ... , without using 'traditional' for-loops (one iterating over the columns and another iterating over words within each column's string value)?

All my attempts gave error messages like The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Example:

d = {'foo' : [1,2,3], 'bar': [-2,-2,-3]}
df = pd.DataFrame({'col': ['foo mur bar','foo','mur mur']}, index=[1,2,3])

Expected output is:

np.array([
    [[1,2,3],[-2,-2,-3]],
    [[1,2,3]],
    [[]]
])

Yes. There are 385 existing results for [*\[python\] create word vector*](https://stackoverflow.com/search?q=%5Bpython%5D+create+word-vector), please search through them. This is a duplicate. — smci, Oct 30 '19 at 08:01
Possible duplicate of [Convert pandas dataframe to NumPy array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array) — Ahmad, Oct 30 '19 at 08:04

score 1 · Accepted Answer · answered Oct 30 '19 at 08:06

1

Try using:

a = df['col'].str.split().apply(lambda x: pd.Series(x).map(d)).values
a = np.array([pd.Series(i).dropna().values for i in a])
print(a)

Output:

[array([[1, 2, 3], [-2, -2, -3]], dtype=object)
 array([[1, 2, 3]], dtype=object) array([], dtype=object)]

answered Oct 30 '19 at 08:06

U13-Forward

69,221
14
89
114

thank you. I could do it by ```np.array(df1.col.apply(lambda x: np.array([d[item] for item in x.split(' ') if d.get(item, 0) != 0])))``` but you way more quickly imho. – MaratSR Oct 30 '19 at 08:17

Effective way to get array from pandas dataframe text column

1 Answers1