0

I am trying to use a C++ scraper in my ui to cipher through WSJ stock information to get some balance sheet info back, I have it for where it searchers for specific text in the page source ie "Pe/Ratio" and then i manually counted how many chars are in between it and the actual number on the website.

Here is the picture of the code

 // P/E Ratio
    size_t indexpeRatio = html.find("P/E Ratio ") + 116;
    string s_peRatio = html.substr(indexpeRatio, 5);
    peRatio = stod(s_peRatio);

After manually doing that it simply stores the number and I output it to my UI. My Issue is that sometimes the number of characters in between change depending on which company i choose to evaluate. I am wondering if there is a way to use the .find() function to find the "Pe/Ratio" then output the next float/int,

here is what the html looks like on the site

As of right now sometimes my ui will output parts of the html due to having to use a fixed number of chars

this is an example of my ui output when giving a smaller company to evaluate

Do you all have any recommendations I can use to fix this issue? Thank you in advance!

Lane Floyd
  • 73
  • 1
  • 7

1 Answers1

0

You could iterate through the characters.

Say you have string html:

#include<ctype.h>
#include<string>
using namespace std;

int main(){
     double peratio;
     string html;
     /*

     This is where you do your HTML scraping logic

     */
     size_t indexpeRatio=html.find("P/E Ratio");
     peratio=find_ratios(html.substr(indexpeRatio,strlen(html)-indexpeRatio));
}

double find_ratios(string html){
    int i=0;
    std::string output;
    bool wasInt=false,isInt=false;
    while(html[i]!='\0'&&!wasInt){
        if(isdigit(html[i]))
            isInt=true;
        if(isInt)
            if(html[i]!='.'&&!isdigit(html[i])){
                wasInt=true;
                isInt=false;
            }
            else output+=html[i];
        i++;
    }
    return stod(output);
}
Evan Hendler
  • 342
  • 1
  • 12
  • Okay so in the while loop you do it while wasInt is true and html is not = to '\0', two questions, what does the \0 represent? And where are you getting [i] in the array, I know ive used this in the for loop but where did you define [i] at? – Lane Floyd Apr 23 '20 at 01:03
  • Sorry, forgot `i`. `'\0'` is null. Strings in C++ are null terminated, so I put that in to allow the loop to terminate if it hits the end of the string. Technically, you could get through the entire HTML string without having to use `.find()` at all. – Evan Hendler Apr 23 '20 at 01:05
  • Okay i appreciate the help! But in your code you are starting at the very top of the HTML document correct? How could I go about starting it directly after "Pe Ratio" in the code, sorry i am very new to all of this? – Lane Floyd Apr 23 '20 at 01:10
  • That's fine! I was very new at one point as well. I'll see if I can't give you something more complete. – Evan Hendler Apr 23 '20 at 01:13
  • Yep - again, this is not all inclusive, and I haven't compiled it. Just free balling here. Let me know if you have any issues. – Evan Hendler Apr 23 '20 at 01:23
  • The only issue I have with it right now is it is saying invalid operands to binary expression when using html!='\0'. Still trying to work on fixing it. – Lane Floyd Apr 23 '20 at 01:44
  • Sorry, my bad. Should be `html[i]`. `'\0'` is a char, and `html` is a string, i.e. an array of chars. `html[i]` will access that specific char for a char to char comparison – Evan Hendler Apr 23 '20 at 01:45
  • 1
    Ahh that actually makes perfect sense, thats why its invalid operand since html is the full string and the '\0' is meant for a null char, thank you again! – Lane Floyd Apr 23 '20 at 01:50