It is a scikit-learn convention: estimators accept matrices of numbers, not strings or other data types. This allows them to be agnostic to data type - each estimator can handle tabular, text data, images, etc. But it means you need to convert your data (text in your case) to numbers.
There are many ways to convert text to numbers. An easiest is called "Bag of Words" - for each possible word there is a column, and document has 1 (or word count) in a column if a word is present in this document, and 0 otherwise. scikit-learn provides CountVectorizer for that (as well as a few other vectorizers):
from sklearn.feature_extraction.text import CountVectorizer
vec = CountVectorizer()
X = vec.fit_transform(docs)
clf = RandomForestClassifier()
clf.fit(X, y)
See http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html for a complete example and http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction for more information about text vectorization.