Text analysis in python

Question

Have you seen some good tutorials for text analysis in python or just as a theory? I mean something like determining the topic of a text, analyzing words etc.

There are a number of good examples of natural language processing and other text mining techniques out there - search for them and then ask your specific questions when you have trouble. As it stands, this question is far too opinion based. — LinkBerest, Nov 28 '15 at 23:04
Questions asking us to **recommend or find a book, tool, software library, tutorial or other off-site resource** are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](http://meta.stackoverflow.com/questions/254393) and what has been done so far to solve it. — MattDMo, Nov 28 '15 at 23:12

score 2 · Accepted Answer · edited May 23 '17 at 11:52

You can use Apache Spark, it comes with four supported languages (Java, Scala, Python and R), it's compatible with ipython and jupyter with some tricky modifications.

There some courses you can audit:

Here is a small pdf that paves the subject.

Text Analysis with Apache Spark

And here I show a small example of word-count using Apache-Spark, but it is not limited to this subject (it has PCA, SVD, and a big etc.)

documentRDD = sc.parallelize(["Hello", "world", "from", "the", "python", "world"])
tokensTupleRDD = documentRDD.map(lambda word: (word, 1))
tokensCountRDD = tokensTupleRDD.reduceByKey(lambda a, b: a + b)
print(tokensCountRDD.collect()) 
# ("Hello", 1), ("world", 2),...

Other alternative is to use Scikit-learn, which is very used, easy and covers this area too, the only bad thing is that algorithms can't run in clusters and don't scale nicely.

They even have a very easy tutorial in their site:

Scikit-learn Tutorial

So if your trying to learn I would suggest scikit-learn, but if you are trying to apply Bid Data at your work I might suggest you to study both and use Apache Spark

Text analysis in python

1 Answers1