4

I'm trying to download all the questions and answers form users profile , but there is a problem , if user has big number of questions that I have to click on "Show more" to expand that list.If I try to download for example this persons questions and answers : http://ask.fm/UnaRamekic (random choice) , I'll get only those that are shown , those that are displayed after I click show more are not retrieved with get request.How can I get all the questions with ICS or Indy components. Thanks.

My code:

procedure TForm1.sButton1Click(Sender: TObject);
begin
With HttpCli1 do begin
    URL            := sedit1.Text;
    RequestVer     := '1.1';
    RcvdStream := TMemoryStream.Create;
    try
        Get;
    except
        ShowMessage('There has been an error , check your internet connection !');
        RcvdStream.Free;
        Exit;
    end;

    RcvdStream.Seek(0,0);
    Memo1.Lines.LoadFromStream(RcvdStream);
    RcvdStream.Free;
 end;
 end;
TLama
  • 75,147
  • 17
  • 214
  • 392
Daniel
  • 269
  • 1
  • 4
  • 11

2 Answers2

1

You're not going to be able to do that with Indy or ICS alone. What you initially see is exactly what is being downloaded when you pull the HTTP request.

If you look at the HTML source of the page, you'll see that the "View More" button has a JavaScript event handler attached to it that makes an AJAX request to the server, pulls more data from it, and applies it to the page. If you want to do the same, your code needs to parse things out at least enough to get the right AJAX parameters, then make the request to the server from your Indy or ICS code like any other HTTP request, and deal with the data that comes back.

Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477
1

Warning:

This approach is lame and quite dangerous. It's posting the form data similarly like the Show more button does, but it uses a while loop (to receive all pages), which repeats until the exact constant in response is found (in code it's the LastPageResponse constant), so when the response content of the page will be changed some time and that constant won't be in the response, you will find yourself in the infinite loop.

In the GetAllQuestions function you can specify:

  • AUser - is the user name after the slash from the URL
  • AFromDate - is a starting date time from which you want to get results
  • AStartPage - is a starting page from the AFromDate date time from which you want to get results

The GetAllQuestions function returns a base user's page, followed by line breaks separated content in a range from the base page to all pages from the time and page you specify. Forgot to notice, that the additional content you'll need to parse in a different way than a base page, since it's not a HTML content.

uses
  IdHTTP;

implementation

function GetAllQuestions(const AUser: string; AFromDate: TDateTime;
  AStartPage: Integer = 1): string;
var
  Response: string;
  LastPage: Integer;
  TimeString: string;
  HTTPClient: TIdHTTP;
  Parameters: TStrings;
const
  LineBreaks = sLineBreak + sLineBreak;
  LastPageResponse = '$("#more-container").hide();';
begin
  Result := '';
  HTTPClient := TIdHTTP.Create(nil);
  try
    Result := HTTPClient.Get('http://ask.fm/' + AUser) + LineBreaks;
    Parameters := TStringList.Create;
    try
      LastPage := AStartPage;
      TimeString := FormatDateTime('ddd mmm dd hh:nn:ss UTC yyyy', AFromDate);
      Parameters.Add('time=' + TimeString);
      Parameters.Add('page=' + IntToStr(LastPage));
      while LastPage <> -1 do
      begin
        Parameters[1] := 'page=' + IntToStr(LastPage);
        Response := HTTPClient.Post('http://ask.fm/' + AUser + '/more',
          Parameters);
        if Copy(Response, Length(Response) - Length(LastPageResponse) + 1,
          MaxInt) = LastPageResponse
        then
          LastPage := -1
        else
          LastPage := LastPage + 1;
        Result := Result + Response + LineBreaks;
      end;
    finally
      Parameters.Free;
    end;
  finally
    HTTPClient.Free;
  end;
end;

And the usage:

procedure TForm1.Button1Click(Sender: TObject);
begin
  try
    Memo1.Text := GetAllQuestions('TLama', Now);
  except
    on E: Exception do
      ShowMessage(E.Message);
  end;
end;
TLama
  • 75,147
  • 17
  • 214
  • 392
  • 2
    Now thinking about it, looking at the content, this might be the right place to [`use regex for HTML parsing`](http://stackoverflow.com/a/1732454/960757), seriously. When you want to extract just the questions with answers, regex is IMHO the easiest way to do so in this case. – TLama Sep 01 '12 at 08:42