Sentence compression using NLP

Question

Using Machine translation, can I obtain a very compressed version of a sentence, eg. I would really like to have a delicious tasty cup of coffee would be translated to I want coffee Does any of the NLP engines provide such a functionality?

I got a few research papers that does paraphase generation and sentence compression. But is there any library which has already implemented this?

I don't know of a tool that does this, but parsing followed by removal of adverbs in adjectival phrases and some other constructs might give you a decent baseline. — Fred Foo, Oct 22 '11 at 14:59
You can remove adjectives/adverbs, but what you indicate in the above example is compressing verb forms, ie 'would really like to have' -> 'want'. Also, 'tasty cup of coffee' to 'coffee'? There are lots of situations where you want to get the root noun, say 'car dealership of the town'. I don't know of a tool to do this. — nflacco, Oct 23 '11 at 06:12
I would post on metaoptimize.com/qa/ , too. You can try to contact James Clarke at http://jamesclarke.net . — cyborg, Oct 23 '11 at 08:12
I can't help thinking of the Suntory ad in Lost in Translation: "Turn to the camera...with intensity." http://www.youtube.com/watch?v=FiQnH450hPM — Iterator, Feb 14 '12 at 04:04
According to our [on-topic](https://stackoverflow.com/help/on-topic) guidance, "**Some questions are still off-topic, even if they fit into one of the categories listed above:**...Questions asking us to *recommend or find a book, tool, software library, tutorial or other off-site resource* are off-topic..." — Robert Columbia, Jul 20 '18 at 02:22

Khairul · Answer 1 · 2012-02-14T03:30:39.143

If your intention is to make your sentences brief without losing important idea from that sentences then you can do that by just extracting triplet subject-predicate-object.

Talking about tools/engine, I recommend you to use Stanford NLP. Its dependency parser output already provides subject and object(if any). But you still need to do some tuning to get desired result.

You can download Stanford NLP and learn sample usage here

I found paper related to your question. Have a look at Text Simpliﬁcation using Typed Dependencies: A Comparison of the Robustness of Different Generation Strategie

mwweb · Answer 2 · 2016-09-09T20:22:18.627

Here is what i find:

A modified implementation of the model described in Clarke and Lapata, 2008, "Global Inference for Sentence Compression: An Integer Linear Programming Approach".

Paper: https://www.jair.org/media/2433/live-2433-3731-jair.pdf

Source: https://github.com/cnap/sentence-compression (written in JAVA)

Input: At the camp , the rebel troops were welcomed with a banner that read 'Welcome home' .

Output: At camp , the troops were welcomed.

Update: Sequence-to-Sequence with Attention Model for Text Summarization.

https://github.com/tensorflow/models/tree/master/textsum

https://arxiv.org/abs/1509.00685

score 1 · Answer 3 · answered Jul 06 '17 at 06:02

To start with try using watson NaturalLanguageUnderstanding/Alchemy libraries. Using which I was able to extract important keywords from my statements, example :

Input : Hey! I am having issues with my laptop screen

Output : laptop screen issues hardware.

not just rephrasing but using NLU you can get the following details of your input statement, like for above statement you can get details for the following categories:

Language like “en”, Entities, Concepts, Keywords like "laptop screen“ , “issues” with details like relevance, text, keyword emotion, sentiment. Categories with details like labels, relevance score. SemanticRoles with details like sentence, it's subject, action and object

Along with this you can use the tone analyzer to get the prominent tone of the statement like, fear, anger, happy, disgust etc.

following is the code sample for watson libraries note : waston libs are not free but gives one month trial, so you can start with this and then once you get hold of the concepts then switch to other open source libraries and figure out similar libraries and functions

NaturalLanguageUnderstanding service = new NaturalLanguageUnderstanding(
    NaturalLanguageUnderstanding.VERSION_DATE_2017_02_27,
    WatsonConfiguration.getAlchemyUserName(),
    WatsonConfiguration.getAlchemyPassword());


//ConceptsOptions
ConceptsOptions conceptOptions = new ConceptsOptions.Builder()
    .limit(10)
    .build();

//CategoriesOptions
    CategoriesOptions categoriesOptions = new CategoriesOptions();

//SemanticOptions
SemanticRolesOptions semanticRoleOptions = new SemanticRolesOptions.Builder()
    .entities(true)
    .keywords(true)
    .limit(10)
    .build();

EntitiesOptions entitiesOptions = new EntitiesOptions.Builder()
    .emotion(true)
    .sentiment(true)
    .limit(10)
    .build();

KeywordsOptions keywordsOptions = new KeywordsOptions.Builder()
    .emotion(true)
    .sentiment(true)
    .limit(10)
    .build();

Features features = new Features.Builder()
    .entities(entitiesOptions)
    .keywords(keywordsOptions)
    .concepts(conceptOptions)
    .categories(categoriesOptions)
    .semanticRoles(semanticRoleOptions)
    .build();

AnalyzeOptions parameters = new AnalyzeOptions.Builder()
    .text(inputText)
    .features(features)
    .build();

AnalysisResults response = service
    .analyze(parameters)
    .execute();
System.out.println(response);

KhuzG · Answer 4 · 2015-03-11T15:39:35.700

You can use a combination of "stop word removal" and "Stemming and lemmatization". Stemming and lemmatization is a process that returns all the words in the text to their basic root, you can find the full explanation here ,I am using Porter stemmer look it up in google. After the Stemming and lemmatization, stop words removal is very easy here is my stop removal method :

public static String[] stopwords ={"a", "about", "above", "across", "after", "afterwards", "again", "against", "all", "almost", 
    "alone", "along", "already", "also","although","always","am","among", "amongst", "amoungst", "amount",  "an", "and", 
    "another", "any","anyhow","anyone","anything","anyway", "anywhere", "are", "around", "as",  "at", "back","be","became", 
    "because","become","becomes", "becoming", "been", "before", "beforehand", "behind", "being", "below", "beside", "besides", 
    "between", "beyond", "bill", "both", "bottom","but", "by", "call", "can", "cannot", "cant", "co", "con", "could", "couldnt",
    "cry", "de", "describe", "detail", "do", "done", "down", "due", "during", "each", "eg", "eight", "either", "eleven","else",
    "elsewhere", "empty", "enough", "etc", "even", "ever", "every", "everyone", "everything", "everywhere", "except", "few", 
    "fifteen", "fify", "fill", "find", "fire", "first", "five", "for", "former", "formerly", "forty", "found", "four", "from", 
    "front", "full", "further", "get", "give", "go", "had", "has", "hasnt",
    "have", "he", "hence", "her", "here", "hereafter", "hereby", "herein", "hereupon", "hers", "herself", 
    "him", "himself", "his", "how", "however", "hundred", "ie", "if", "in", "inc", "indeed", "interest", "into", 
    "is", "it", "its", "itself", "keep", "last", "latter", "latterly", "least", "less", "ltd", "made", "many", 
    "may", "me", "meanwhile", "might", "mill", "mine", "more", "moreover", "most", "mostly", "move", "much", "must", 
    "my", "myself", "name", "namely", "neither", "never", "nevertheless", "next", "nine", "no", "nobody", "none", 
    "noone", "nor", "not", "nothing", "now", "nowhere", "of", "off", "often", "on", "once", "one", "only", "onto", 
    "or", "other", "others", "otherwise", "our", "ours", "ourselves", "out", "over", "own","part", "per", "perhaps",
    "please", "put", "rather", "re", "same", "see", "seem", "seemed", "seeming", "seems", "serious", "several", "she",
    "should", "show", "side", "since", "sincere", "six", "sixty", "so", "some", "somehow", "someone", "something", 
    "sometime", "sometimes", "somewhere", "still", "such", "system", "take", "ten", "than", "that", "the", "their", 
    "them", "themselves", "then", "thence", "there", "thereafter", "thereby", "therefore", "therein", "thereupon", 
    "these", "they", "thickv", "thin", "third", "this", "those", "though", "three", "through", "throughout", "thru", 
    "thus", "to", "together", "too", "top", "toward", "towards", "twelve", "twenty", "two", "un", "under", "until", 
    "up", "upon", "us", "very", "via", "was", "we", "well", "were", "what", "whatever", "when", "whence", "whenever",
    "where", "whereafter", "whereas", "whereby", "wherein", "whereupon", "wherever", "whether", "which", "while", 
    "whither", "who", "whoever", "whole", "whom", "whose", "why", "will", "with", "within", "without", "would", "yet",
    "you", "your", "yours", "yourself", "yourselves","1","2","3","4","5","6","7","8","9","10","1.","2.","3.","4.","5.","6.","11",
    "7.","8.","9.","12","13","14","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
    "terms","CONDITIONS","conditions","values","interested.","care","sure","!","@","#","$","%","^","&","*","(",")","{","}","[","]",":",";",",","<",">","/","?","_","-","+","=",
    "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
    "contact","grounds","buyers","tried","said,","plan","value","principle.","forces","sent:","is,","was","like",
    "discussion","tmus","diffrent.","layout","area.","thanks","thankyou","hello","bye","rise","fell","fall","psqft.","http://","km","miles"};

In my project I used paragraph as my text input:

public static String removeStopWords(String paragraph) throws IOException{
    Scanner paragraph1=new Scanner( paragraph );
    String newtext="";
    Map map = new TreeMap();
    Integer ONE = new Integer(1);
    while(paragraph1.hasNext()) {
        int flag=1;
        fixString=paragraph1.next();
        fixString=fixString.toLowerCase();
        for(int i=0;i<stopwords.length;i++) {
            if(fixString.equals(stopwords[i])) {
                flag=0;
            }
        }
        if(flag!=0) {
            newtext=newtext+fixString+" ";  
        }
            if (fixString.length() > 0) {
            Integer frequency = (Integer) map.get(fixString);
            if (frequency == null) {
                frequency = ONE;
            } else {
                int value = frequency.intValue();
                frequency = new Integer(value + 1);
            }
            map.put(fixString, frequency);                 
            }                     
    }
    return newtext;
}

I have used Stanford NLP library you can download if from here. I hope that I have helped you in some way.

Links are dead at the moment. In addition removing stop words is not sufficient, as @nflacco commented. Overall this answer should be improved. — mins, Mar 08 '15 at 19:29
I have checked the links and they are working and I know the my answer is not perfect (nothing is perfect in NLP) but it is a start. — KhuzG, Mar 10 '15 at 03:32
Welcome to the site. Your answer is an interesting contribution and will be likely more voted after you fix a couple of aspects: For some reason the [first link is dead](http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html). As explained in [How do I write a good answer?](http://stackoverflow.com/help/how-to-answer) it is good practice to *provide context for links*. This will also balance the importance given to *stemming and lemmatization* against the stopwords removal step. — mins, Mar 10 '15 at 06:49
You should define the purpose and use of code instead of posting the code and provide a solution in such way that it remains language independent. — Ravinder Payal, Jul 25 '16 at 17:24
I think he needs some semantic analysis to extract the information but this is just preprocessing. — Raymond Chen, Mar 28 '17 at 17:39

Sentence compression using NLP

4 Answers4

Linked