2

Can I do the following conversion to an array, using constructs like df.col.apply(lambda x ... , without using 'traditional' for-loops (one iterating over the columns and another iterating over words within each column's string value)?

All my attempts gave error messages like The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

Example:

d = {'foo' : [1,2,3], 'bar': [-2,-2,-3]}
df = pd.DataFrame({'col': ['foo mur bar','foo','mur mur']}, index=[1,2,3])

Expected output is:

np.array([
    [[1,2,3],[-2,-2,-3]],
    [[1,2,3]],
    [[]]
])
MaratSR
  • 77
  • 1
  • 6
  • Yes. There are 385 existing results for [*\[python\] create word vector*](https://stackoverflow.com/search?q=%5Bpython%5D+create+word-vector), please search through them. This is a duplicate. – smci Oct 30 '19 at 08:01
  • Possible duplicate of [Convert pandas dataframe to NumPy array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array) – Ahmad Oct 30 '19 at 08:04

1 Answers1

1

Try using:

a = df['col'].str.split().apply(lambda x: pd.Series(x).map(d)).values
a = np.array([pd.Series(i).dropna().values for i in a])
print(a)

Output:

[array([[1, 2, 3], [-2, -2, -3]], dtype=object)
 array([[1, 2, 3]], dtype=object) array([], dtype=object)]
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • thank you. I could do it by ```np.array(df1.col.apply(lambda x: np.array([d[item] for item in x.split(' ') if d.get(item, 0) != 0])))``` but you way more quickly imho. – MaratSR Oct 30 '19 at 08:17