1

I have a dataframe: dt and a list of column names: nn_language

EDIT: added sample data

dt = pd.DataFrame({"language1": ["english", "english123", "ingles", "ingles123", "14.0", "13", "french"],
                  "language2": ["englesh", "english123", "ingles", "ingles123", "14", "13", "french"]})
nn_language = dt.columns[dt.columns.str.contains("language")]

All the elements of dt[nn_language] are of object type. What I would like to do, is change the initial values of the dt[nn_language] to "english" if the initial value is like ("english","ingles",14) else i want to change the initial value to "other"

I have tried: dt[nn_language].apply(lambda x: 'english' if x.str.contains('^engl|^ingl|14.0') else 'other')

but i get an error ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().',

This and this did not help me

cs95
  • 379,657
  • 97
  • 704
  • 746
quant
  • 4,062
  • 5
  • 29
  • 70

1 Answers1

3

Use isin:

check = ["english","ingles", '14']
dt[nn_language].apply(lambda x: np.where(x.isin(check) , 'english', 'other'))

Or:

dt[nn_language].apply(lambda x: pd.Series(np.where(x.isin(check) , 'english', 'other')))

It seems you need:

dt[nn_language].apply(lambda x: np.where(x.str.contains('^engl|^ingl|14.0')  , 'english', 'other'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • In your answer i think `14` is missing quotation marks, since all the elements are of type `object`. So I think it should be `["english","ingles",'14']`. Other than that it works, and i think you were slightly faster than @coldspeed !. Quick follow up question, do you know where this error that i had, came from ? – quant Sep 14 '17 at 11:52
  • @quant No, I was first. If you must decide which answer to accept by timings. – cs95 Sep 14 '17 at 11:53
  • I think you need `'14'` if string, or `14` if number. Or both. – jezrael Sep 14 '17 at 11:54
  • Also, would the `isin` work if for instance i had a value `english123` ? – quant Sep 14 '17 at 11:56
  • No, isin need exact match. – jezrael Sep 14 '17 at 11:57
  • 1
    It seems you need `str.contains`. – jezrael Sep 14 '17 at 11:59
  • why `dt[nn_language].apply(lambda x: 'english' if np.where(x.str.contains('^engl|^ingl|14.0')) else 'other')` yields and error but `dt[nn_language].apply(lambda x: np.where(x.str.contains('^engl|^ingl|14.0') , 'english', 'other'))` works ? – quant Sep 14 '17 at 12:04
  • 1
    Because `np.where` working with arrays. And your code working with items, scalars. – jezrael Sep 14 '17 at 12:14