R extract percentage of entries out of textfile using readlines

Question

Hi I have a very large txt-file (character) where I want to extract 10% of the entries and save those to another txt-file.

con1 <- file("ABC.txt", "rb")   # 2,36 mio DS
dfc1<-readLines(con1, ??? ,skipNul = TRUE)#

Instead of ??? I want to have something like <10% of all data> .

So If my ABC.txt was like

" BBC Worldwide is a principle commercial arm and a wholly owned subsidiary of the British Broadcasting Corporation (BBC). The business exists to support the BBC public service mission and to maximise profits on its behalf..."

my new file should contain only 10% (random) of the words like:

" Worldwide business behalf..."

Is there a way to do that in R ?

Thank you

Possible duplicate of [Importing and extracting a random sample from a large .CSV in R](https://stackoverflow.com/questions/27981460/importing-and-extracting-a-random-sample-from-a-large-csv-in-r) — pogibas, Mar 03 '18 at 16:47

score 1 · Accepted Answer · answered Mar 03 '18 at 17:20

If you read in the text file, you can then use the stringr package to get a 10% random sample of the words using the following code:

text<- c("BBC Worldwide is a principle commercial arm and a wholly owned subsidiary of the British Broadcasting Corporation (BBC). The business exists to support the BBC public service mission and to maximise profits on its behalf...")
set.seed(9999)
library(stringr)
selection<-sample.int(str_count(text," ")+1, round(0.1*str_count(text," ")+1))
subset<-word(text, selection)

R extract percentage of entries out of textfile using readlines

1 Answers1