1

Using strtok one can get each tocken in the para individually.

I want to capture all sentences in the page individually for process them separately.

One solution is I keep for loop and check each character, if it is . then I consider sentence is completed so store in some data structure. I dont know which data structure is best suitable to store this. Array or vector?

Is there any other better way or some c++ class available to do this?

UPDATE

Later I want to perform action on negations in the sentence. Means considering not, no, nope such key words. if not + negative word then taking it as +ve word.

hippietrail
  • 15,848
  • 18
  • 99
  • 158
user123
  • 5,269
  • 16
  • 73
  • 121
  • 1
    I'd consider using `vector` or `list` to store your completed sentences. Which of `vector` or `list` makes more sense depends on what processing you intend to do. But you still have a little work to construct the strings properly before pushing them onto the `vector` or `list`. – Joe Z Dec 01 '13 at 06:59
  • @JoeZ: thanks, I edited my question. I welcome if you have change in your answer! – user123 Dec 01 '13 at 07:06
  • 1
    If you're not reorganizing the order of the sentences, then `vector` should do fine once you have the strings for each of the sentences. If you don't need to store all of the sentences to apply your predicates, then you don't need to even push them into a structure; rather you just need a function that returns the next sentence as a `string`. – Joe Z Dec 01 '13 at 07:26
  • @JoeZ: right, even if I get next sentence as a string that will also enough for me. But I dont know how could be done with vector! – user123 Dec 01 '13 at 09:36

1 Answers1

3

As you are using C++, the best data structure to store strings is the std::string class. Store multiple strings in a std::vector<std::string>. By the way don't use strtok, use std::getline instead.

But as you are doing text manipulation, and perhaps international text manipulation, you should take a look at the ICU library. In this case icu::BreakIterator::createSentenceInstance in particular.

Community
  • 1
  • 1
dalle
  • 18,057
  • 5
  • 57
  • 81
  • Thanks buddy. please see I have edited my question. If you have something else to say. – user123 Dec 01 '13 at 07:07
  • Is there any drawback of using external library in C++ program? like slowing down the execution? – user123 Dec 01 '13 at 07:10
  • 1
    @Karimkhan, That depends on the library's design. For example, a lot of Boost will not affect runtime performance much, but will make your project a PITA to rebuild. Maybe I shouldn't say a lot, but there are at least a few notable examples ;) – chris Dec 01 '13 at 07:11
  • @chris: what is your suggestion for `ICU` library? or same thing could be done using core c++ classes in better way? – user123 Dec 01 '13 at 07:18
  • @Karimkhan, I've barely heard of it. I can't speak for or against it. However, if you don't have mandatory procedures to follow for using a new library, chances are you probably won't have to worry about adopting it. In my eyes, it's a much bigger deal when an existing big software project decides to use a new library, and that's when the benefits and drawbacks have to really be weighed out. I doubt you'll find more than a few libraries that offer this sort of utility, and choosing one is way better than doing it yourself. – chris Dec 01 '13 at 07:20