2

First of all, I don't need the code, it would be rude to ask. I just need to know the best way to achieve this. I'm trying to make a tool that downloads every possible photo-answer of an Ask.Fm profile given the url of the profile.

I think the best solution would need one or two asynchronous threads, I'm not sure yet.

First Thread

Option A: This thread should get the links from the profile page and push them in a List. When it finishes processing the page, it emulates a button click ("View more") and goes on searching for other links, and so on (there is no page 2, AJAX script adds elements to the page when you click that button).

Option B: Maybe this thread should instead emulate a lot of clicks first, until the button disappears, when you have reached something like one year old answers. And then with a single foreach and a Regex filter it would be easy to get all links. But with this option I wouldn't have links as soon as possible, instead I would get them all at the end of its clicking job, and that would take time, because I think you have to wait some milliseconds to avoid bugs invoking buttons too fast.

Making a custom List with an OnAdd event would allow to process every link that is coming from the first thread, or maybe just checking every 5 seconds with a standard list would be easier, I don't know. I don't even know if I should use arrays (I come from C++).

  1. Should I use a separate thread and all this List thing to download all the links that the first thread is getting, or is this stupid and I can just download right after I find a link? Wouldn't that be too memory expensive?
  2. I'm sure I need at least one asynchronous thread. I don't want the form to freeze til the end of script. But I don't know what's the best multithreading option. What do you suggest?
  3. Should I use Lists? Custom lists with OnAdd event? Arrays?
  4. Most important: Do you know other better ways to achieve all this?

Thank you in advance, Neflux.

Kristof U.
  • 1,263
  • 10
  • 17
Neflux
  • 35
  • 3
  • Have you looked at all into the [TPL Dataflow](http://msdn.microsoft.com/en-us/library/dd460717(v=vs.110).aspx)? It's good for parallelization and enumeration through collections. – Matthew Haugen Jul 22 '14 at 01:25
  • sounds good. I learned about it relatively recently, within the last four or five months anyway, but I've found it really made a lot of tasks (pun intended?) a lot easier to do than they would otherwise have been. – Matthew Haugen Jul 22 '14 at 01:38
  • Check this: http://stackoverflow.com/a/22262976/1768303 – noseratio Jul 22 '14 at 04:53
  • 1
    Thank you very much @Noseratio, [this](http://stackoverflow.com/questions/20930414/how-to-dynamically-generate-html-code-using-nets-webbrowser-or-mshtml-htmldocu/20934538#20934538) will be useful. – Neflux Jul 22 '14 at 10:04
  • @Noseratio I'm trying your entire code on a new project, everything is working fine except one thing. The CancellationTokenSource constructor takes 0 arguments (in my definition), but you initializes it with a parameter. `new CancellationTokenSource((int)TimeSpan.FromMinutes(3).TotalMilliseconds);` – Neflux Jul 22 '14 at 11:25
  • @Neflux, my code targets .NET 4.5 version of [CancellationTokenSource](http://msdn.microsoft.com/en-us/library/hh139229(v=vs.110).aspx), while yours apparently targets .NET 4.0. – noseratio Jul 22 '14 at 11:29

1 Answers1

1
  1. You should probably get all the links as quickly as possible. If your goal ever changes from ask.fm to something else, not getting all of the links can lead to page changes while you are processing your links which can cause duplicates and other headaches.

  2. You could use one or two background workers:
    http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker(v=vs.110).aspx

  3. I personally love the System.Collections.Generic.List . I would not do an event necessarily but that is up to you.

  4. If you want something out of the box you might look into Kimono, Portia, import.io.

If you want to get really smart you emulate the data pushed from their ajax calls. Use something like Wireshark to figure it out.

Mainly I have no reputation so here is something I have some experience in so I answered.

smagnan
  • 1,197
  • 15
  • 29
Carter
  • 697
  • 5
  • 22