Questions tagged [summarization]

Summarization is the process of identifying the most important information from a source, or a number of sources, in order to present it in a short form.

Summarization is the process of identifying the most important information from a source, or a number of sources, in order to present it in a short form. Automatic Summarization is the process of producing summaries by means of automatic techniques, in order to overcome the cost and time required to manually produce summaries (e.g. by professional human abstractors).

The need for Automatic Summarization is motivated by the problem of information overload: the amount of available information is constantly increasing, while the time users can afford to spend scanning through information remains constant (or is even decreasing).

Basic definitions

Depending on the function, a summary can be:

  • Indicative: indicates whether reading the full text in depth is worthwhile;
  • Informative: covers all the salient aspects of the source;
  • Critical: provides a critique of the source and expresses opinions about the source material.

Depending on the user, a summary can be:

  • Static / Generic: not personalized for a particular user;
  • Dynamic / User-oriented: tailored for a specific user, depending on a user profile or a user query.

Depending on the source, the summarization process can be:

  • Single-document: the summary provides information from a single source;
  • Multi-document: the summary provides information from a number of sources which discuss a particular topic, possibly providing overlapping information.

Depending on the summarization technique, a summary can be called:

  • Abstract: if some material is not verbatim present in the original source, e.g. some rephrasing is involved;
  • Extract: if all the material is verbatim present in the original source.

Readings

374 questions
35
votes
3 answers

Return most frequent string value for each group

a <- c(rep(1:2,3)) b <- c("A","A","B","B","B","B") df <- data.frame(a,b) > str(b) chr [1:6] "A" "A" "B" "B" "B" "B" a b 1 1 A 2 2 A 3 1 B 4 2 B 5 1 B 6 2 B I want to group by variable a and return the most frequent value of b My desired result…
rmuc8
  • 2,869
  • 7
  • 27
  • 36
19
votes
2 answers

How can I calculate the percentage change within a group for multiple columns in R?

I have a data frame with an ID column, a date column (12 months for each ID), and I have 23 numeric variables. I would like to obtain the percentage change by month within each ID. I am using the quantmod package in order to obtain the percent…
mmmmmmmmmm
  • 435
  • 1
  • 3
  • 9
18
votes
2 answers

Summarizing a Wikipedia Article

I find myself having to learn new things all the time. I've been trying to think of ways I could expedite the process of learning new subjects. I thought it might be neat if I could write a program to parse a wikipedia article and remove…
Jesse Aldridge
  • 7,991
  • 9
  • 48
  • 75
15
votes
4 answers

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty…
ryandenki
  • 1,859
  • 3
  • 19
  • 30
10
votes
3 answers

Summarise over all columns

I have data of the following format: gen = function () sample.int(10, replace = TRUE) x = data.frame(A = gen(), C = gen(), G = gen(), T = gen()) I would now like to attach, to each row, the total sum of all the elements in the row (my actual…
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
10
votes
6 answers

Data in different resolutions

I have two tables, records are being continuously inserted to these tables from outside source. Lets say these tables are keeping statistics of user interactions. When a user is clicking a button the details of that click (the user, time of click…
nimcap
  • 10,062
  • 15
  • 61
  • 69
9
votes
5 answers

Summing Multiple Groups of Columns

I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F …
Syzorr
  • 587
  • 1
  • 5
  • 17
8
votes
2 answers

How to count occurrences combinations in data.table in R

I have two data.tables. I would like to count the number of rows matching a combination of a table in another table. I have checked the data.table documentation but I have not found my answer. I am using data.table 1.9.2. DT1 <- data.table(a=c(3,2),…
poiuytrez
  • 21,330
  • 35
  • 113
  • 172
7
votes
3 answers

tapply() function dependent on multiple columns in R

In R, I have a table with Location, sample_year and count. So, Location sample_year count A 1995 1 A 1995 1 A 2000 3 B 2000 1 B 2000 1 B 2000 5 I want a…
DeLongTime
  • 71
  • 1
  • 2
7
votes
1 answer

Long Sequence In a seq2seq model with attention?

I am following along this pytorch tutorial and trying to apply this principle to summarization, where the encoding sequence would be around 1000 words and decoder target 200 words. How do I apply seq2seq to this? I know it would be very expensive…
vijendra rana
  • 151
  • 1
  • 1
  • 4
7
votes
2 answers

Bin data by (x,y) and summarize

These are the first 10 lines of a huge files I have: (Note that there is only one user in these 10 lines but I've got thousands of users) dput(testd) structure(list(user = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L ), otime = structure(c(10L, 9L, 8L,…
unixsnob
  • 1,685
  • 2
  • 19
  • 45
6
votes
3 answers

installing pyrouge gets error in ubuntu

i wants to install pyrouge in Ubuntu for the purpose of text summarization evaluation. i use the instructions in this. first i wrote pip install pyrouge then i must write this command: pyrouge_set_rouge_path…
Mahsa
  • 581
  • 1
  • 9
  • 28
6
votes
3 answers

obtaining 3 most common elements of groups, concatenating ties, and ignoring less common values

I am trying to get the 3 most common numbers per group of a dataframe, using a function, but ignoring the less common values (per group), and allowing a unique number if present. Accepted answer will have the lowest system.time #my current…
Ferroao
  • 3,042
  • 28
  • 53
6
votes
2 answers

relative windowed running sum through data.table non-equi join

I have a data set customerId, transactionDate, productId, purchaseQty loaded into a data.table. for each row, I want to calculate the sum, and mean of purchaseQty for the prior 45 day productId customerID transactionDate purchaseQty 1: …
Ethan
  • 442
  • 2
  • 10
6
votes
2 answers

data.table: Using with=False and transforming function/summary function?

I want to summarise several variables in data.table, output in wide format, output possibly as a list per variable. Since several other approaches did not work, I tried to do an outer lapply, giving the names of the variables as character vectors. I…
Julian
  • 741
  • 8
  • 19
1
2 3
24 25