0

This is a follow up question to Regex inside findall vs regex inside count

.str.count('\w') works for me when called on the column of a dataframe, but not when called on a Series.

X_train[0:7] is a Series:

872 I'll text you when I drop x off 831 Hi mate its RV did u hav a nice hol just a mes... 1273 network operator. The service is free. For T &... 3314 FREE MESSAGE Activate your 500 FREE Text Messa... 4929 Hi, the SEXYCHAT girls are waiting for you to ... 4249 How much for an eighth? 3640 You can stop further club tones by replying \S... Name: text, dtype: object

X_train[0:7].str.count('\w') returns

872 0 831 0 1273 0 3314 0 4929 0 4249 0 3640 1 Name: text, dtype: int64)

When called on the same Series, converted into a dataframe column:

d = X_train[0:7]

df = pd.DataFrame(data=d)

df['col1'].str.count('\w') returns:

872 23 831 101 1273 50 3314 120 4929 98 4249 18 3640 98 Name: col1, dtype: int64

Why does it work on a dataframe column, but not on a series? Grateful for your advice.

ZakS
  • 1,073
  • 3
  • 15
  • 27
  • Works also an `Series` for me – MaxNoe Oct 27 '18 at 20:36
  • I'm using a Jupyter notebook and definitely getting the results I posted. "return type(X_train[0:7]), X_train[0:7].str.count('/w')" gives me the confirmation it is a series, plus the results above – ZakS Oct 27 '18 at 20:39

0 Answers0