Split string and add breaks before every number

Question

So I have vector which contains bibliography

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal
of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

I would like to split string and add new line/breaks before every number that represents numbering, i.e. 1. and 2. and 3. So if I would have say 50 bibliography I would like to automatically split all strings in vector and also add breaks before every number that represents numbering.

So far I've tried this (which is not the best option as third bibliograhpy is left out):

   bibliography <- unlist(strsplit(bibliography, "  "))
    bibliography <- bibliography[-length(bibliography)] <- paste0(bibliography[-length(bibliography)], ' \\\\ ')

And the output was this (WHICH IS MY DESIRED OUTPUT):

   [1] "1. Cohen, A. C. (1955). Restriction and selection in samples from bivariate normal distributions. Journal\nof the American Statistical Association, 50, 884–893. \\\\ "
    [2] "2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.\nBiometrika, 75, 11–20. \\\\ "

But this is time consuming as I had to manually add double space before every number (i.e., 1. and 2.) for this code to work.

I've also looked here

Add new line before every number in a string

Inserting Newline character before every number occurring in a string?

thepule · Accepted Answer · 2016-09-02T10:18:01.050

2

This gets you pretty much where you want:

library(stringr)
library(dplyr)

# The first line adds the "~" character at the right break point
str_split(gsub("([1-9]\\.[]*[A-Z])","~\\1",bibliography), "~") %>%
unlist()  %>%
str_trim(side = c("both")) # Trimming potential spaces at the strings sides

edited Sep 02 '16 at 10:18

answered Sep 02 '16 at 10:14

thepule

1,721
1
12
22

score 1 · Answer 2 · answered Sep 02 '16 at 10:24

I tried a regex based approach

bibliography <- c("1. Cohen, A. C. (1955). Restriction and selection insamples from bivariate normal distributions. Journal of the American Statistical Association, 50, 884–893.  2.Breslow, N. E. and Cain, K. C. (1988). Logistic regression for the two-stage case-control data.
                  Biometrika, 75, 11–20.  3.Arismendi, J. C. (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41–75")

out <- gsub("([^0-9][0-9]{1}\\.|^[0-9]{1}\\.)", "\t\\1",bibliography)
out <- unlist(strsplit(out, "\t"))
out <- gsub("^\\s+|\\s+$", "", out)
out <- out[-1]

You could probably give it a shot.

Split string and add breaks before every number

2 Answers2

Linked