I am working on analyzing some text in R and have settled on (for the moment) Markov chains as part of my procedure. Here is an example of what I'm doing:
# Required libraries
library(stringi) # Input cleaning
library(tidyverse) # dplyr, ggplot, etc.
library(hunspell) # Spell checker
library(markovchain) # Markov chain calculation
# Input
shake <- c("To be, or not to be - that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune Or to take arms against a sea of troubles And by opposing end them.")
# Process to clean input
miniclean <- function(x = ""){
# x is character string input
words_i = x %>%
gsub(pattern = "[^[:alpha:][:space:]\']", replacement = "") %>%
#gsub(pattern = "[\n]+", replacement = "") %>% # Drop line breaks
stri_trans_tolower() %>%
strsplit(split = " ") %>%
unlist()
correct = hunspell_check(words_i)
words_o = words_i[correct]
return(words_o)
}
# Clean input
cleans <- miniclean(shake)
# Compute Markov chain using cleaned input
mark2 <- markovchainFit(cleans)
# Plot results
plot(mark2$estimate)
The base plot
graphics produces this visualization:
I would really like a bit more control over the plot (e.g., increasing arrow lengths to increase the overall size to make it more readable), but I don't see how to do it.
Ideas?
(edited to make a complete example)