Highest Voted 'text-analysis' Questions

81

votes

4 answers

Stemmers vs Lemmatizers

Natural Language Processing (NLP), especially for English, has evolved into the stage where stemming would become an archaic technology if "perfect" lemmatizers exist. It's because stemmers change the surface form of a word/token into some…

asked Jun 26 '13 at 10:19

alvas

115,346
109
446
738

72

votes

4 answers

How to extract common / significant phrases from a series of text entries

I have a series of text items- raw HTML from a MySQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word matching). My example is any review on Yelp.com,…

nlp text-extraction nltk text-analysis

asked Mar 16 '10 at 08:42

arronsky

721
1
6
3

57

votes

6 answers

Training data for sentiment analysis

Where can I get a corpus of documents that have already been classified as positive/negative for sentiment in the corporate domain? I want a large corpus of documents that provide reviews for companies, like reviews of companies provided by analysts…

nlp machine-learning text-analysis sentiment-analysis training-data

asked Sep 26 '11 at 06:18

London guy

27,522
44
121
179

23

votes

1 answer

How to find common phrases in a large body of text

I'm working on a project at the moment where I need to pick out the most common phrases in a huge body of text. For example say we have three sentences like the following: The dog jumped over the woman. The dog jumped into the car. The dog jumped…

data-structures graph data-mining text-analysis

asked Dec 18 '09 at 15:52

benmcredmond

1,702
2
15
22

20

votes

3 answers

How to remove stopwords efficiently from a list of ngram tokens in R

Here's an appeal for a better way to do something that I can already do inefficiently: filter a series of n-gram tokens using "stop words" so that the occurrence of any stop word term in an n-gram triggers removal. I'd very much like to have one…

r performance n-gram stop-words text-analysis

asked Oct 12 '15 at 00:09

Ken Benoit

14,454
27
50

17

votes

2 answers

tag generation from a small text content (such as tweets)

I have already asked a similar question earlier but I have notcied that I have big constrain: I am working on small text sets suchs as user Tweets to generate tags(keywords). And it seems like the accepted suggestion ( point-wise mutual information…

twitter nlp text-extraction nltk text-analysis

asked May 04 '10 at 09:20

Hellnar

62,315
79
204
279

15

votes

5 answers

Error using langdetect in python: "No features in text"

Hey I have a csv with multilingual text. All I want is a column appended with a the language detected. So I coded as below, from langdetect import detect import csv with open('C:\\Users\\dell\\Downloads\\stdlang.csv') as csvinput: with…

python text-analysis language-detection

asked Nov 24 '16 at 10:06

user7140275

215
1
3
9

15

votes

3 answers

Use brain.js neural network to do text analysis

I'm trying to do some text analysis to determine if a given string is... talking about politics. I'm thinking I could create a neural network where the input is either a string or a list of words (ordering might matter?) and the output is whether…

neural-network text-analysis brain.js

asked May 05 '16 at 06:10

Andrew Rasmussen

14,912
10
45
81

15

votes

1 answer

Trying to get tf-idf weighting working in R

I am trying to do some very basic text analysis with the tm package and get some tf-idf scores; I'm running OS X (though I've tried this on Debian Squeeze with the same result); I've got a directory (which is my working directory) with a couple text…

r tm tf-idf text-analysis

asked Feb 11 '13 at 20:49

cforster

577
2
7
19

15

votes

1 answer

Very simple text classification by machine learning?

Possible Duplicate: Text Classification into Categories I am currently working on a solution to get the type of food served in a database with 10k restaurants based on their description. I'm using lists of keywords to decide which kind of food is…

python algorithm machine-learning text-analysis

asked Dec 09 '12 at 14:20

Dieter

441
1
5
15

14

votes

5 answers

Check if a string is a possible abbrevation for a name

I'm trying to develop a python algorithm to check if a string could be an abbrevation for another word. For example fck is a match for fc kopenhavn because it matches the first characters of the word. fhk would not match. fco should not match fc…

python string-matching slug abbreviation text-analysis

asked Sep 07 '11 at 09:20

Björn Lindqvist

19,221
20
87
122

14

votes

4 answers

Extract words from PDF with golang?

I don't understand type conversion. I know this isn't right, all I get is a bunch of hieroglyphs. f, _ := os.Open("test.pdf") defer f.Close() io.Copy(os.Stdout, f) I want to work with the strings....

pdf go text-analysis

asked Oct 02 '16 at 04:33

omgj

1,369
3
12
18

13

votes

2 answers

How to combine TFIDF features with other features

I have a classic NLP problem, I have to classify a news as fake or real. I have created two sets of features: A) Bigram Term Frequency-Inverse Document Frequency B) Approximately 20 Features associated to each document obtained using pattern.en…

machine-learning nlp text-analysis

asked Feb 01 '18 at 23:02

Massifox

4,369
11
31

13

votes

2 answers

Wordcloud is cropping text

I am using twitter API to generate sentiments. I am trying to generate a word-cloud based on tweets. Here is my code to generate a wordcloud wordcloud(clean.tweets, random.order=F,max.words=80, col=rainbow(50), scale=c(3.5,1)) Result for this: I…

r text-analysis word-cloud sttwitterapi

asked Nov 28 '17 at 05:31

Harsh Shah

2,162
2
19
39

13

votes

3 answers

Java text analysis libraries

I'm looking for a java driven solution to a requirement for analysing sentences to log whether a key word was used positively or negatively. Ie The key word might be 'cabbages' and the sentence:- 'I like cabbages but not peas' And I'd like a java…

java text analysis text-analysis

asked Sep 23 '10 at 12:33

jaseFace

1,415
5
22
34

Questions tagged [text-analysis]