0

Hi I have a very large txt-file (character) where I want to extract 10% of the entries and save those to another txt-file.

con1 <- file("ABC.txt", "rb")   # 2,36 mio DS
dfc1<-readLines(con1, ??? ,skipNul = TRUE)#

Instead of ??? I want to have something like <10% of all data> .

So If my ABC.txt was like

" BBC Worldwide is a principle commercial arm and a wholly owned subsidiary of the British Broadcasting Corporation (BBC). The business exists to support the BBC public service mission and to maximise profits on its behalf..."

my new file should contain only 10% (random) of the words like:

" Worldwide business behalf..."

Is there a way to do that in R ?

Thank you

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
user3443063
  • 1,455
  • 4
  • 23
  • 37
  • Possible duplicate of [Importing and extracting a random sample from a large .CSV in R](https://stackoverflow.com/questions/27981460/importing-and-extracting-a-random-sample-from-a-large-csv-in-r) – pogibas Mar 03 '18 at 16:47

1 Answers1

1

If you read in the text file, you can then use the stringr package to get a 10% random sample of the words using the following code:

text<- c("BBC Worldwide is a principle commercial arm and a wholly owned subsidiary of the British Broadcasting Corporation (BBC). The business exists to support the BBC public service mission and to maximise profits on its behalf...")
set.seed(9999)
library(stringr)
selection<-sample.int(str_count(text," ")+1, round(0.1*str_count(text," ")+1))
subset<-word(text, selection)
Marko
  • 387
  • 1
  • 3
  • 13