How do I get part of a char*?

Question

I have the following code that solves a small image using Tesseract.

char *answer = tess_api.GetUTF8Text();

I know beforehand that the result will always start with the character '+' and it's one word so I want to get rid of any junk it finds.

I get the result as "G+ABC S\n\n" and I need only +ABC. So basically I need to ignore anything before + and everything after the first space. I was thinking I should use rindex to find the position of + and spaces.

You probably should be using `std::string`, it's much easier. — MSalters, Jan 29 '14 at 08:38
if the result of `tess_api.GetUTF8Text()` are a string with uft8 characters, is better that uses `wchar*` instead `char*` — Narkha, Jan 29 '14 at 08:41
I am expecting only uppercase letters. I am not expecting any utf8 characters so I have used tess_api.SetVariable("tessedit_char_whitelist", "+ABCDEFGHIJKLMNOPQRSTUVWXYZ"); — Crypto, Jan 29 '14 at 08:45

David Sykes · Answer 1 · 2014-01-29T09:54:55.567

3

std::string ParseString(const std::string& s)
{
    size_t plus = s.find_first_of('+');
    size_t space = s.find_first_of(" \n", plus);

    return s.substr(plus, space-plus);
}

int main()
{
    std::cout << ParseString("G+ABC S\n\n").c_str() << std::endl;
    std::cout << ParseString("G +ABC\ne\n").c_str() << std::endl;

    return 0;
}

Gives

+ABC
+ABC

If you really can't use strings then something like this might do

char *ParseString2(char *s)
{
    int plus,end;
    for (plus = 0 ; s[plus] != '+' ; ++plus){}
    for (end = plus ; s[end] != ' ' && s[end] != '\n' ; ++end){}
    char *result = new char[end - plus + 1];
    memcpy(result, s + plus, end - plus);
    result[end - plus] = 0;
    return result;
}

edited Jan 29 '14 at 09:54

answered Jan 29 '14 at 08:52

David Sykes

48,469
17
71
80

Not sure if the string `"G +"` is a valid input, but it would cause problems for this algorithm. – Lundin Jan 29 '14 at 08:58
@Lundin Why do you say that? The space is searched for after the first + – David Sykes Jan 29 '14 at 08:59
Ah yeah, actually that won't be a problem, but rather the case where it doesn't find either of the two symbols. But the question doesn't mention that such error handling is required, so nevermind :) – Lundin Jan 29 '14 at 09:10

score 1 · Answer 2 · edited May 23 '17 at 11:56

1

You can use:

// just scan "answer" to find out where to start and where to end
int indexStart = // find the index of '+'
int indexEnd = // find the index before space

int length = indexEnd-indexStart+1;
char *dataYouWant = (char *) malloc(length+1);  // result will be stored here
memcpy( dataYouWant, &answer[indexStart], length ); 
                                     // for example answer = "G+ABC S\n\n"
dataYouWant[length] = '\0';          // dataYouWant will be "+ABC"

You can check out Strings in c, how to get subString for other alternatives.

P.S. suggestion: use string instead in C++, it will be much easier (check out @DavidSykes's answer).

edited May 23 '17 at 11:56

Community

1
1

answered Jan 29 '14 at 08:42

herohuyongtao

49,413
29
133
174

The result can have variable length. There can be more than 1 letter before the '+' or none at all so I need everything from + till the end of the word (newline or space). – Crypto Jan 29 '14 at 08:54
dataYouWant[length] = '\0'; writes beyond the end of the allocated memory – David Sykes Jan 29 '14 at 09:06
main.cpp:222:44: error: invalid conversion from ‘void*’ to ‘char*’ [-fpermissive] char *dataYouWant = malloc(length+1); – Crypto Jan 29 '14 at 09:11
How do I get the values of indexStart and indexEnd? Do I use rindex? – Crypto Jan 29 '14 at 09:17
@Crypto Just scan `answer` to find these two indexes, which is easy to do. – herohuyongtao Jan 29 '14 at 09:19
By scan, you mean iterate through each character? – Crypto Jan 29 '14 at 09:22
@Crypto Yes. It's linear time, same time compared to `string::find()`. – herohuyongtao Jan 29 '14 at 09:23
I used sizeof(answer) to get the length but some words seem to be cutting off in between. – Crypto Jan 29 '14 at 09:45
1

@Crypto You should use `strlen` to get the length. – herohuyongtao Jan 29 '14 at 09:54

How do I get part of a char*?

2 Answers2