3

I want to read text document in R based on following condition - based on certain keywords it will read the sentences and whenever it will find the keywords and sentence ended with full stop (.), just stores only those statement in a list.

output- list contain only those statement which have particular keyword.

I tried with scan function like this-

b<-scan("cbt14-Short Stories For Children.txt",what = "char",sep = '.', nlines = 50)

as scan function have so many parameter, which I, am unable to understand right now.

can we achieve above output using scan function???

keyword = "ship"

input--

this article u can read from "www.google.com/ship". Illustrated by Subir Roy and Geeta Verma Man Overboard I stood on the deck of S.S. Rajula. As she slowly moved out of Madras harbour, I waved to my grandparents till I could see them no more. I was thrilled to be on board a ship. It was a new experience for me. "Are you travelling alone?" asked the person standing next to me. "Yes, Uncle, I'm going back to my parents in Singapore," I replied. "What's your name?" he asked. "Vasantha," I replied. I spent the day exploring the ship. It looked just like a big house. There were furnished rooms, a swimming pool, a room for indoor games, and a library. Yet, there was plenty of room to 11111 around. The next morning the passengers were seated in the dining hall, having breakfast. The loudspeaker spluttered noisily and then the captain's voice came loud and clear. "Friends we have just received a message that a storm is brewing in the Indian Ocean. I request all of you to keep calm. Do not panic. Those who are inclined to sea- 3

output list--

[1]this article u can read from "www.google.com/ship".

[2]I was thrilled to be on board a ship.

[3] I spent the day exploring the ship.

andy
  • 525
  • 3
  • 6
  • 22
  • Do you have mid-sentence new lines? It would help if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output. – MrFlick Apr 26 '17 at 21:53
  • @MrFlick, i edited the question with sample input and expected output. thanks – andy Apr 26 '17 at 22:01

1 Answers1

2

The difficult part of this problem is properly separating the sentences. In this case I am using the period followed by a space ". " to define a sentence. In this sample it does produce a sentence with a single word - "Rajula" but this may be acceptable depending on your final application.

#split the text into sentences using a ". "
sentences<-strsplit(b, "\\. ")
#find the sentences with the word ship in the answer
finallist<-sentences[[1]][grepl("ship", sentences[[1]] )]

The above code uses base R. Looking into the stringi or stringr library, there maybe a function to better handle the string splitting on a defined sentence.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • In this case you need first to read your text with something like `fileName <- 'yourfile.txt'; b <- readChar(fileName, file.info(fileName)$size)` and not with scan – JulioSergio Apr 27 '17 at 01:03