-4

Have you seen some good tutorials for text analysis in python or just as a theory? I mean something like determining the topic of a text, analyzing words etc.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
CuriousGuy
  • 1,545
  • 3
  • 20
  • 42
  • There are a number of good examples of natural language processing and other text mining techniques out there - search for them and then ask your specific questions when you have trouble. As it stands, this question is far too opinion based. – LinkBerest Nov 28 '15 at 23:04
  • 1
    Questions asking us to **recommend or find a book, tool, software library, tutorial or other off-site resource** are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](http://meta.stackoverflow.com/questions/254393) and what has been done so far to solve it. – MattDMo Nov 28 '15 at 23:12

1 Answers1

2

You can use Apache Spark, it comes with four supported languages (Java, Scala, Python and R), it's compatible with ipython and jupyter with some tricky modifications.

There some courses you can audit:

Here is a small pdf that paves the subject.

And here I show a small example of word-count using Apache-Spark, but it is not limited to this subject (it has PCA, SVD, and a big etc.)

documentRDD = sc.parallelize(["Hello", "world", "from", "the", "python", "world"])
tokensTupleRDD = documentRDD.map(lambda word: (word, 1))
tokensCountRDD = tokensTupleRDD.reduceByKey(lambda a, b: a + b)
print(tokensCountRDD.collect()) 
# ("Hello", 1), ("world", 2),...

Other alternative is to use Scikit-learn, which is very used, easy and covers this area too, the only bad thing is that algorithms can't run in clusters and don't scale nicely.

They even have a very easy tutorial in their site:

So if your trying to learn I would suggest scikit-learn, but if you are trying to apply Bid Data at your work I might suggest you to study both and use Apache Spark

Community
  • 1
  • 1
Alberto Bonsanto
  • 17,556
  • 10
  • 64
  • 93