3

I have started developing a webpage and recently hired someone to write code to display a customized feed (powered by API) in the middle panel on http://farmball.com/. Note that this is not the RSS feed tied to the site blog. The feed ties to my account on another site. There is no RSS link for an average user to subscribe to the feed. I've taken the site out of maintenance mode to ask anyone here with scraping/hacking experience how someone would most easily go about 'taking' the feed and displaying it on their own site. More importantly, what can I do to prevent it?

^Updated for re-wording

Adam
  • 405
  • 1
  • 5
  • 22
  • 9
    isn't the point of a feed to be read, parsed and displayed elsewhere? – Matthew Rapati Sep 24 '09 at 01:26
  • You should probably totally rewrite your question to be very clear that you're not talking about the RSS feed, or any kind of dedicated feed format. – Rex M Sep 24 '09 at 02:23
  • It sounds like a goal without merit--not a good expenditure of the effort required for any return you'd get. – Craig McQueen Sep 24 '09 at 02:33
  • So, to be clear, you are asking how easy is it for someone to scrape the content out of your page and redisplay it? – Rex M Sep 24 '09 at 02:33
  • Craig: Are you referring to my project or someone trying to get the feed? I realized that someone was able to do it in 1 minute. Rex M: Yes, that's exactly it. I am going to look into JSON a bit. Maybe there is a way to scramble/encrypt the JSON or javascript? – Adam Sep 24 '09 at 03:29
  • I guess it boils down to screen-scraping. Taking a look at this link now: http://stackoverflow.com/questions/396817/protection-from-screen-scraping – Adam Sep 24 '09 at 03:40
  • Sorry, I should clarify. Your web site is good and fine. But trying to stop someone scraping the content would be very tough. – Craig McQueen Sep 24 '09 at 04:21
  • A better strategy might be to make your site so excellent, nobody would want to visit those nasty rip-offs. – Craig McQueen Sep 24 '09 at 04:25

5 Answers5

9

You can't.

If you are going to expose an RSS feed which you don't want others to be able to display on their site then you are completely missing the point of RSS. The entire reason for Really Simple Syndication (RSS) is to make your content externally consumable- whether that's in an RSS Reader or through someone simply printing its content on their own website.

Why are you including an RSS feed if you do not want someone to be able to consume it?

Nathan Taylor
  • 24,423
  • 19
  • 99
  • 156
  • The philosophy is dead-on of course, but it doesn't directly answer the question (or only does in a very roundabout way)... – Rex M Sep 24 '09 at 01:32
  • It answers it perfectly. The answer is you can't, because what he's asking is entirely against the design of the tool he's using. – Matthew Scharley Sep 24 '09 at 01:33
  • @Matthew it's a rhetorical question exposing how absurd the idea of trying to protect RSS is, but it doesn't actually say "you can't". – Rex M Sep 24 '09 at 01:34
  • 1
    I disagree with this. RSS readers are useful on the desktop as an endpoint. What is being asked here is if there's any way to enforce the fact that the reader should be an endpoint, and not redistributed further. There's plenty of examples of proprietary content you'd want to make available but not redistributable. Think of movie, game and music companies. – McPherrinM Sep 24 '09 at 01:46
  • @WaffleMatt in the movie, game and music examples, the content is distributed in formats designed specifically to be securable. RSS is not. – Rex M Sep 24 '09 at 01:57
  • Thank you, Matt. I didn't think it was that uncommon to see a "feed" up on a site where one is unable to grab it as RSS. I'm not trying to protect my own content here though. The links are all outgoing to other sites. – Adam Sep 24 '09 at 02:18
  • @WaffleMatt: Sure I understand that, but given the diversity of RSS readers out there, I for one cannot think of any certain way to guarantee that the endpoint is a reader and not a website. If this were possible then my answer would be different. – Nathan Taylor Sep 24 '09 at 03:14
6

what can I do to prevent...'taking' the feed and displaying it on their own site?

Nothing. Preventing reuse goes against the basic concept of RSS, which is to make it as easy as possible for anyone to do anything they want with it. It was designed from the ground up to be Really Simple to Syndicate, not Really Hard to Retransmit Without Permission.

You could restrict access to the feed itself to trusted users only by making them provide some credentials or pass in a key to the feed (e.g. yoursite.rss?mykey=abc123). But you cannot control use. Only access.

Rex M
  • 142,167
  • 33
  • 283
  • 313
1

Be explicit about your license. It isn't a technology solution, as others have mentioned, the technology is an open technology-- this isn't DRM! But if you ask in each post that people who use this feed to not repost/fail to give credit/etc then some people will respond to the request.

Otherwise, you're better off putting your content behind a password and using a paid subscription model for distributing your content.

MatthewMartin
  • 32,326
  • 33
  • 105
  • 164
  • Sorry, I wasn't clear in the initial post. The feed is not the site feed for my own content, rather it refers to a feed in the middle panel to which I hook into API from another site. – Adam Sep 24 '09 at 02:22
0

This is a DRM problem essentially. If you had some technique that you could put content on the web without having it redistributable, the music industry would love you.

It is possible to try to prevent redistribution. One technique you could try is embedding a signature of some sort into the feed for each user who you require to sign up. If the content is found on the web, you can identify and ban the user who redistributed your content.

This is avoidable too, by getting multiple accounts and normalizing the content to remove fingerprints. For the would-be pirate, this requires more effort than they may be willing to put in. Your signature could be a unique whitespace pattern, tiny variances in the timestamps on posts, misplaced pixels in videos, or any other thing you can vary slightly without end users noticing.

McPherrinM
  • 4,526
  • 1
  • 22
  • 25
0

use .htpassword better yet, don't put something private in a public place where it's likely to get picked up by software automatically. Like others have said, it's a pretty odd question, if you're trying to figure something else out, you're better off being explicit with what you want to know.

MonkeyMagic
  • 321
  • 1
  • 3