3

I have a small shiny app for annotating text files.

  1. The UI provides fileInput to select .txt files. One of the files is the default when the app is launched.
  2. Next, Previous buttons allow user to display the contents of the file, one sentence at a time.
  3. User may select any text within a sentence and click the Add Markup button to annotate the sentence. The Action Button triggers javascript function addMarkup().
  4. The sentence is displayed after being marked up.

I am only posting the shiny app code here. Complete code of the app is available on github repository

  library(shiny)
  ui <- fluidPage(
  tags$head(tags$script(src="textselection.js")),
  titlePanel("Corpus Annotation Utility"),
  sidebarLayout(
    sidebarPanel(
      fileInput('fileInput', 'Select Corpus', accept = c('text', 'text','.txt')),
      actionButton("Previous", "Previous"),
      actionButton("Next", "Next"),
      actionButton("mark", "Add Markup")
    ),
    mainPanel(
     tags$h1("Sentence: "),
     htmlOutput("sentence"),
     tags$h1("Sentence marked up: "),
     htmlOutput("sentenceMarkedUp") 
    )
  )
)
server <- function(input, output) {
    sourceData <- reactive({
   corpusFile <- input$fileInput
   if(is.null(corpusFile)){
     return(readCorpus('data/news.txt'))
   }
  readCorpus(corpusFile$datapath)
  })

 corpus <- reactive({sourceData()}) 
 values <- reactiveValues(current = 1)
  observeEvent(input$Next,{
    if(values$current >=1 & values$current < length(corpus())){
      values$current <- values$current + 1
    }
  })
  observeEvent(input$Previous,{
    if(values$current > 1 & values$current <= length(corpus())){
      values$current <- values$current - 1
    }
  })
  output$sentence <- renderText(corpus()[values$current])
}
shinyApp(ui = ui, server = server)  

readCorpus() function looks like this:

readCorpus <- function(pathToFile){
  con <- file(pathToFile) 
  sentences <- readLines(con, encoding = "UTF-8")
  close(con)
  return(sentences)
}

My question is how can I persist the sentences to a file after they have been annotated?

screenshot of the app

Update: I have gone through Persistent data storage in Shiny apps, and hope that I will be able to follow along the documentation regarding persistent storage. However I am still unsure how to capture the sentence after it has been marked up.

Imran Ali
  • 2,223
  • 2
  • 28
  • 41
  • 1
    1) Neither in the code above nor on github, i can find the "markup-code"; there is no function using `input$mark`,....am i missing sthg or is it maybe within `MyUtils.R` which isnt shared? 2) Concerning the persistent data storage link: You are looking for local file system solution? – Tonio Liebrand Mar 28 '17 at 17:57
  • 1
    @BigDataScientist 1) It's done in JS – HubertL Mar 28 '17 at 19:16
  • I have update github repo to include `MyUtils.R`. Regarding persistent data storage It would be ideal that the client could save file on his own machine. – Imran Ali Mar 29 '17 at 00:28
  • 1
    The issue here is that your markup is just mimicking a user selection so it's not hardcoded in the sentence itself which makes it hard to save. You should pbly try to use `span` like I suggested in your original question [here](http://stackoverflow.com/questions/42546819/select-text-in-htmloutput-based-on-datatable-in-shiny). This would hardcode the selection in the string and make it easy to save. – NicE Mar 29 '17 at 14:22
  • Do you need to capture multiple annotations per sentence? – TARehman Apr 03 '17 at 15:06
  • yes quite possibly multiple annotations per sentence – Imran Ali Apr 03 '17 at 16:08
  • I'll push a change to your repo in a few minutes. – TARehman Apr 03 '17 at 17:49
  • @ImranAli I've submitted a PR on Github. – TARehman Apr 03 '17 at 18:20

1 Answers1

2

You have two issues here - persisting the changes, and then saving the output. I solved the problem using a bit of JS and a bit of R code. I'll do a pull request on Github to submit the broader code. However, here's the core of it.

In your Javascript that you use to select things, you can use Shiny.onInputChange() to update an element of the input vector. Doing this, you can create a reactiveValues item for the corpus, and then update it with inputs from your interface.

Below, you'll notice that I switched from using a textnode to using just the inner HTML. Using a node, and firstChild, as you had it before, you end up truncating the sentence after the first annotation (since it only picks the stuff before <mark>. Doing it this way seems to work better.

window.onload = function(){
  document.getElementById('mark').addEventListener('click', addMarkup);
}

function addMarkup(){
  var sentence = document.getElementById("sentence").innerHTML,
  selection="";
  if(window.getSelection){
    selection = window.getSelection().toString();
  }
  else if(document.selection && document.selection.type != "Control"){
    selection = document.selection.createRange().text;
  }
  if(selection.length === 0){
    return;
  }
  marked = "<mark>".concat(selection).concat("</mark>");
  result = sentence.replace(selection, marked);
  document.getElementById("sentence").innerHTML = result;
  Shiny.onInputChange("textresult",result);
}

Next, I've tried to simplify your server.R code. You were using a reactive context to pull from another reactive context (sourceData into corpus), which seemed unnecessary. So, I tried to refactor it a bit.

library(shiny)
source("MyUtils.R")
ui <- fluidPage(
  tags$head(tags$script(src="textselection.js")),
  titlePanel("Corpus Annotation Utility"),
  sidebarLayout(
    sidebarPanel(
      fileInput('fileInput', 'Select Corpus', accept = c('text', 'text','.txt')),
      actionButton("Previous", "Previous"),
      actionButton("Next", "Next"),
      actionButton("mark", "Add Markup"),
      downloadButton(outputId = "save",label = "Download")),
    mainPanel(
      tags$h1("Sentence: "),
      htmlOutput("sentence"))
  )
)

server <- function(input, output) {
  corpus <- reactive({
    corpusFile <- input$fileInput
    if(is.null(corpusFile)) {
      return(readCorpus('data/news.txt'))
    } else {
      return(readCorpus(corpusFile$datapath))
    }
  })

  values <- reactiveValues(current = 1)
  observe({
    values$corpus <- corpus()
  })
  output$sentence <- renderText(values$corpus[values$current])

  observeEvent(input$Next,{
    if(values$current >=1 & values$current < length(corpus())) {
      values$current <- values$current + 1
    }
  })
  observeEvent(input$Previous,{
    if(values$current > 1 & values$current <= length(corpus())) {
      values$current <- values$current - 1
    }
  })
  observeEvent(input$mark,{
    values$corpus[values$current] <- input$textresult
  })
  output$save <- downloadHandler(filename = "marked_corpus.txt",
                                 content = function(file) {

                                   writeLines(text = values$corpus,
                                              con = file,
                                              sep = "\n")
                                 })
}

Now, the code has a few changes. The loading from file is basically the same. I was right about my skepticism on isolate - replacing it with an observe accomplishes what I wanted to do, whereas isolate would only give you the initial load. Anyway, we use observe to load the corpus values into the reactiveValues object you created - this is to give us a place to propagate changes to the data.

We keep the remaining logic for moving forward and backward. However, we change the way the output is rendered so that it looks at the reactiveValues object. Then, we create an observer that updates the reactiveValues object with the input from our updated Javascript. When this happens, the data gets stored permanently, and you can also mark more than one sequence in the string (though I have not done anything with nested marking or with removing marks). Finally, a save function is added - the resulting strings are saved out with <mark> used to show the marked areas.

If you load a previously marked file, the marks will show up again.

TARehman
  • 6,659
  • 3
  • 33
  • 60