Questions tagged [summarization]

Summarization is the process of identifying the most important information from a source, or a number of sources, in order to present it in a short form.

Summarization is the process of identifying the most important information from a source, or a number of sources, in order to present it in a short form. Automatic Summarization is the process of producing summaries by means of automatic techniques, in order to overcome the cost and time required to manually produce summaries (e.g. by professional human abstractors).

The need for Automatic Summarization is motivated by the problem of information overload: the amount of available information is constantly increasing, while the time users can afford to spend scanning through information remains constant (or is even decreasing).

Basic definitions

Depending on the function, a summary can be:

Indicative: indicates whether reading the full text in depth is worthwhile;
Informative: covers all the salient aspects of the source;
Critical: provides a critique of the source and expresses opinions about the source material.

Depending on the user, a summary can be:

Static / Generic: not personalized for a particular user;
Dynamic / User-oriented: tailored for a specific user, depending on a user profile or a user query.

Depending on the source, the summarization process can be:

Single-document: the summary provides information from a single source;
Multi-document: the summary provides information from a number of sources which discuss a particular topic, possibly providing overlapping information.

Depending on the summarization technique, a summary can be called:

Abstract: if some material is not verbatim present in the original source, e.g. some rephrasing is involved;
Extract: if all the material is verbatim present in the original source.

Readings

Automatic Summarization by Ani Nenkova and Kathleen McKeown http://repository.upenn.edu/cgi/viewcontent.cgi?article=1749&context=cis_papers
http://en.wikipedia.org/wiki/Automatic_summarization

374 questions

votes

3 answers

Return most frequent string value for each group

a <- c(rep(1:2,3)) b <- c("A","A","B","B","B","B") df <- data.frame(a,b) > str(b) chr [1:6] "A" "A" "B" "B" "B" "B" a b 1 1 A 2 2 A 3 1 B 4 2 B 5 1 B 6 2 B I want to group by variable a and return the most frequent value of b My desired result…

r summarization

asked Apr 28 '15 at 14:19

rmuc8

2,869
7
27
36

votes

2 answers

How can I calculate the percentage change within a group for multiple columns in R?

I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent…

r dplyr summarization

asked Jul 11 '15 at 01:59

mmmmmmmmmm

votes

2 answers

Summarizing a Wikipedia Article

I find myself having to learn new things all the time. I've been trying to think of ways I could expedite the process of learning new subjects. I thought it might be neat if I could write a program to parse a wikipedia article and remove…

python statistics machine-learning wikipedia summarization

asked Jan 01 '12 at 02:21

Jesse Aldridge

7,991
9
48
75

votes

4 answers

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty…

mysql nullable summarization

asked Aug 19 '09 at 06:28

ryandenki

1,859
3
19
30

votes

3 answers

Summarise over all columns

I have data of the following format: gen = function () sample.int(10, replace = TRUE) x = data.frame(A = gen(), C = gen(), G = gen(), T = gen()) I would now like to attach, to each row, the total sum of all the elements in the row (my actual…

r dplyr summarization

asked Jan 22 '15 at 17:54

Konrad Rudolph

530,221
131
937
1,214

votes

6 answers

Data in different resolutions

I have two tables, records are being continuously inserted to these tables from outside source. Lets say these tables are keeping statistics of user interactions. When a user is clicking a button the details of that click (the user, time of click…

database data-warehouse etl summarization

asked Jan 07 '10 at 16:43

nimcap

10,062
15
61
69

votes

5 answers

Summing Multiple Groups of Columns

I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F …

r group-by dplyr purrr summarization

asked May 22 '18 at 05:17

Syzorr

votes

2 answers

How to count occurrences combinations in data.table in R

I have two data.tables. I would like to count the number of rows matching a combination of a table in another table. I have checked the data.table documentation but I have not found my answer. I am using data.table 1.9.2. DT1 <- data.table(a=c(3,2),…

r data.table summarization

asked Sep 16 '14 at 13:02

poiuytrez

21,330
35
113
172

votes

3 answers

tapply() function dependent on multiple columns in R

In R, I have a table with Location, sample_year and count. So, Location sample_year count A 1995 1 A 1995 1 A 2000 3 B 2000 1 B 2000 1 B 2000 5 I want a…

r summarization

asked Mar 07 '11 at 05:03

DeLongTime

votes

1 answer

Long Sequence In a seq2seq model with attention?

I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words. How do I apply seq2seq to this? I know it would be very expensive…

python lstm summarization pytorch

asked Jun 04 '17 at 05:45

vijendra rana

votes

2 answers

Bin data by (x,y) and summarize

These are the first 10 lines of a huge files I have: (Note that there is only one user in these 10 lines but I've got thousands of users) dput(testd) structure(list(user = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L ), otime = structure(c(10L, 9L, 8L,…

r dataframe plyr binning summarization

asked Feb 27 '13 at 20:24

unixsnob

1,685
2
19
45

votes

3 answers

installing pyrouge gets error in ubuntu

i wants to install pyrouge in Ubuntu for the purpose of text summarization evaluation. i use the instructions in this. first i wrote pip install pyrouge then i must write this command: pyrouge_set_rouge_path…

python ubuntu summarization rouge

asked Aug 26 '17 at 10:04

Mahsa

votes

3 answers

obtaining 3 most common elements of groups, concatenating ties, and ignoring less common values

I am trying to get the 3 most common numbers per group of a dataframe, using a function, but ignoring the less common values (per group), and allowing a unique number if present. Accepted answer will have the lowest system.time #my current…

r dataframe ranking summarization

asked Mar 09 '17 at 14:53

Ferroao

3,042
28
53

votes

2 answers

relative windowed running sum through data.table non-equi join

I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the prior 45 day productId customerID transactionDate purchaseQty 1: …

r data.table summarization

asked Dec 06 '16 at 23:56

Ethan

votes

2 answers

data.table: Using with=False and transforming function/summary function?

I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I…

r data.table lapply summarization group-summaries

asked Nov 10 '14 at 12:50

Julian

2 3

…

24 25 Next