New to R.
I am using tidytext::unnest_tokens
to break down a long text into individual sentences using below
tidy_drugs <- drugstext.raw %>%
unnest_tokens(sentence, Section, token="sentences")
So I get a data.frame with all the sentences converted into rows.
I would like to get the start and end positions for each sentence that is unnested from the long text.
Here is a sample of the long text file. It is from a drug label.
<< *6.1 Clinical Trial Experience
Because clinical trials are conducted under widely varying conditions, adverse reaction rates observed in clinical trials of a drug cannot be directly compared to rates in the clinical trials of another drug and may not reflect the rates observed in practice.
The data below reflect exposure to ARDECRETRIS as monotherapy in 327 patients with classical Hodgkin lymphoma (HL) and systemic anaplastic large cell lymphoma (sALCL), including 160 patients in two uncontrolled single-arm trials (Studies 1 and 2) and 167 patients in one placebo-controlled randomized trial (Study 3).
In Studies 1 and 2, the most common adverse reactions were neutropenia, fatigue, nausea, anemia, cough, and vomiting.*
The desired result is a dataframe with three columns