-1

I have the following code that solves a small image using Tesseract.

char *answer = tess_api.GetUTF8Text();

I know beforehand that the result will always start with the character '+' and it's one word so I want to get rid of any junk it finds.

I get the result as "G+ABC S\n\n" and I need only +ABC. So basically I need to ignore anything before + and everything after the first space. I was thinking I should use rindex to find the position of + and spaces.

TobiMcNamobi
  • 4,687
  • 3
  • 33
  • 52
Crypto
  • 1,217
  • 3
  • 17
  • 33
  • 1
    You probably should be using `std::string`, it's much easier. – MSalters Jan 29 '14 at 08:38
  • if the result of `tess_api.GetUTF8Text()` are a string with uft8 characters, is better that uses `wchar*` instead `char*` – Narkha Jan 29 '14 at 08:41
  • I am expecting only uppercase letters. I am not expecting any utf8 characters so I have used tess_api.SetVariable("tessedit_char_whitelist", "+ABCDEFGHIJKLMNOPQRSTUVWXYZ"); – Crypto Jan 29 '14 at 08:45

2 Answers2

3
std::string ParseString(const std::string& s)
{
    size_t plus = s.find_first_of('+');
    size_t space = s.find_first_of(" \n", plus);

    return s.substr(plus, space-plus);
}

int main()
{
    std::cout << ParseString("G+ABC S\n\n").c_str() << std::endl;
    std::cout << ParseString("G +ABC\ne\n").c_str() << std::endl;

    return 0;
}

Gives

+ABC
+ABC

If you really can't use strings then something like this might do

char *ParseString2(char *s)
{
    int plus,end;
    for (plus = 0 ; s[plus] != '+' ; ++plus){}
    for (end = plus ; s[end] != ' ' && s[end] != '\n' ; ++end){}
    char *result = new char[end - plus + 1];
    memcpy(result, s + plus, end - plus);
    result[end - plus] = 0;
    return result;
}
David Sykes
  • 48,469
  • 17
  • 71
  • 80
  • Not sure if the string `"G +"` is a valid input, but it would cause problems for this algorithm. – Lundin Jan 29 '14 at 08:58
  • @Lundin Why do you say that? The space is searched for after the first + – David Sykes Jan 29 '14 at 08:59
  • Ah yeah, actually that won't be a problem, but rather the case where it doesn't find either of the two symbols. But the question doesn't mention that such error handling is required, so nevermind :) – Lundin Jan 29 '14 at 09:10
1

You can use:

// just scan "answer" to find out where to start and where to end
int indexStart = // find the index of '+'
int indexEnd = // find the index before space

int length = indexEnd-indexStart+1;
char *dataYouWant = (char *) malloc(length+1);  // result will be stored here
memcpy( dataYouWant, &answer[indexStart], length ); 
                                     // for example answer = "G+ABC S\n\n"
dataYouWant[length] = '\0';          // dataYouWant will be "+ABC"

You can check out Strings in c, how to get subString for other alternatives.

P.S. suggestion: use string instead in C++, it will be much easier (check out @DavidSykes's answer).

Community
  • 1
  • 1
herohuyongtao
  • 49,413
  • 29
  • 133
  • 174