-1

I'm using libcurl to connect to a website, and getting the HTML, I'm also using LibTidy to extract the text. My purpose is to verify if a sentence from a text file is inside the HTML.

Thanks to LibTidy I have all the text file as one char*. I'm using : char *strstr(const char *one, const char *two) for comparing the two strings. The first one is the string from libcurl and libTidy parsing, and the second one is a string from a text file.

When I'm using the function strstr(..) I have NULL as result. Using the debugger show my that the two string aren't 'encode' in the same way.

enter image description here

I tried to found where the problem was for the String resulting of the Internet connection. And I tried different sample of code to tried to fix it.

The code given by the libcurl website, give me the same problem, the char *memory isn't encoded well, and I can't compare it properly. https://curl.haxx.se/libcurl/c/getinmemory.html

I also tried the code here : https://stackoverflow.com/a/2329792/10160890, and the char *ptr have the same problem.

I expect to be able to compare the String from libcurl and the String from text file.

axel7083
  • 571
  • 3
  • 16
  • Have you tried dumping the text you get back as hex so you can see the values of the characters? Are you sure that `strlen(in_str)` is returning the right value? Seems like a good task for a debugger so you examine what's going on. – Retired Ninja May 17 '19 at 20:07
  • Yes the debugguer help me a lot, I think the problem came from the function with tidy parsing. – axel7083 May 17 '19 at 21:51
  • You should revert the edit. The new text is not an answerable question. – R.. GitHub STOP HELPING ICE May 18 '19 at 14:00
  • 1
    What is the character encoding of the text file? (It appears not to be compatible with ASCII so why have you referenced ASCII?) – Tom Blodget May 18 '19 at 16:08
  • Note: Despite debuggers being incredibly powerful, you should not expect one to know the character encoding of data in `char` data types. – Tom Blodget May 18 '19 at 16:11

1 Answers1

0

There is no need to convert. Any ASCII text is UTF-8 text, so you just search for it as-is using strstr. This is pretty much the whole point of UTF-8.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • The text from the file is encoded properly (See on the debugguer) but the String from the tidy parsing isn't right, that's why the function "strstr" return NULL value, but I would like to found a way to encode both in the same way to compare them. – axel7083 May 17 '19 at 21:52
  • It looks to me like your debugger is just configured to think strings are latin1 or windows1252 or some other backwards encoding rather than UTF-8. – R.. GitHub STOP HELPING ICE May 17 '19 at 22:39
  • There is no only the debugger, because the function strstr return NULL, I guess libcurl don't return the value in UTF-8. I tried the following code https://stackoverflow.com/questions/2329571/c-libcurl-get-output-into-a-string and the result is the same, in the debugger, the String isn't encoded well. – axel7083 May 18 '19 at 10:46
  • 1
    I think you're misdiagnosing your problem. – R.. GitHub STOP HELPING ICE May 18 '19 at 14:00