-2

I was wondering if there is a way to copy and paste some part of the text from a third party web page. My boss asked me to enter a group of text (50, 100, 200) one-by-one into this website: http://fbatoolkit.com/chart_details?category=T2ZmaWNlIFByb2R1Y3Rz&rank=500 and copy/paste the information "3 (30 days avg)" into another file. The "rank=500" part is the query string in the url. And I also know where the info, in the html source code, is. It is here:

    <div style="margin: 20px">
        Estimate sales per day for the rank  
        <input type="text" name="rank" value="500" />
        in this category.

        <input type="submit" value="Estimate" />

            <table width="200">
                <tr>
                    <td>
                        3 (30 Days Avg)
                    </d> 
                </tr>
                <tr>
                    <td>
                        More than 2 (Last Day)
                    </td> 
                </tr>
            </table>

    </div>
</form>

I was wondering if there is a way to recursively access the website and copy/paste that part of text into another file. I know it is probably not the smartest way to do things but please help, the almighty stack overflow! I really appreciate that.

Angelo
  • 185
  • 1
  • 2
  • 15
  • 1
    Copy and paste suggests, quite definitively, user action. What is the coding issue? – Popnoodles Jul 01 '14 at 17:25
  • @Popnoodles I was wondering if I can have the code to copy and paste all those information to another file. In the sum, the code should access the url according to a group of texts and copy the "3 (30 Days Avg)" info too anther file. There is a lot of copy/paste to do, which is why I was hoping the code can help. But yeah you are right. This is essentially mimicing user action. – Angelo Jul 01 '14 at 17:28
  • Will you be doing other categories? The question seems to indicate you'd only have to copy/paste 3 values. In any case, you're talking about web scraping. How much experience do you have and with what languages? – Ryan Jul 01 '14 at 17:32
  • @Ryan I can do python, java, javascript, scheme, php. But I am actually not very familiar with javascript. I have not done many web development. I developed one simple web game before. – Angelo Jul 01 '14 at 17:57
  • @Angelo. OK I'm kind of working on it. So will you be querying more than one category or just this single category? – Ryan Jul 01 '14 at 18:13
  • @Ryan Well i will have to copy/paste a lot of other values. That was an example – Angelo Jul 01 '14 at 18:31
  • @Ryan Thank you for the code. I will be querying more than one category but it is more arbitrary so I am not worrying about it for now. – Angelo Jul 01 '14 at 18:59
  • @Angelo You're welcome. Because you're looking to extract a number of values, let me point you to Rubular.com to build your RegEx queries. I've found it invaluable, and it seems like you'll need a lot of RegEx to do this kind of thing. – Ryan Jul 01 '14 at 19:04
  • @Angelo you're getting voted down by people for the quality of your question, I imagine, and for its content. You may want to check out http://stackoverflow.com/help to learn more about SO and what others expect of your questions, comments, answers and participation – Ryan Jul 01 '14 at 22:52

1 Answers1

0

So I don't write python but I'll give it a shot. These types of tasks are usually very easy to accomplish with Python. So, I'll give you the general language constructs that I would use complete with links to accomplish this.

General Steps

  1. Set up array of categories
  2. Set up array of ranks to use
  3. For loop through each category and then nested loop through each rank
  4. Within this inner loop, query the web page like this: see This Answer for more options to opening and reading URLS

    page = urllib.request.urlopen("URL HERE").read()

  5. Then use RegEx to find the text you're interested in, by doing something like this (Note - the below RegEx was created assuming "(30 Days Avg)" was a static string, which it seemed like from page you supplied. You can re-append this text to the end of the grouped item if you'd like):

    match = re.search("(\w+) (30 Days Avg)$", string) extractedText = match.groups(0)

  6. Append text to file of your choice per This Answer

  7. Close out your loops

Sorry this wasn't more cut-and-paste code. Also the SO text editing syntax doesn't seem to handle code inside lists very well. "extractedText... " should be on its own line.

Community
  • 1
  • 1
Ryan
  • 970
  • 15
  • 36
  • Thank you so much! I will write that this afternoon. – Angelo Jul 01 '14 at 19:09
  • I have another question. The ranks I am using increments every 500. But I have other ranks that are not incrementing it according to this order and they are in excel. Is there way to read a excel column as an array? – Angelo Jul 01 '14 at 20:57
  • Yes there is. I haven't done much with Python, but you should be able to save the excel file as CSV (note: it will only keep one sheet), and then I'm sure there are libraries like http://docs.python-tablib.org/en/latest/ that do this very well. This is a separate question, however, and should be researched and then posted as a question later if needed. – Ryan Jul 01 '14 at 22:45
  • while i tahnk you for the urllib part but it seems parsing html with regex seems to be too big a project for me now, especially as a new programmer. I am trying out some form of library for now – Angelo Jul 02 '14 at 19:31