0

In my application users can submit links. I want to insert the text content of the link in my MySQL database so that I can do further indexing and searching.

I am consider using the file_get_contents method in PHP, and then insert the data into MySQL. What are the security pitfalls here? Or is this task flow wrong and there are some special modules to do these types of work?

(I am already using PDO, but I may not have made clear previously that I would like to get only the main text content, excluding the css and javascript that might be included in the html)

Gumbo
  • 643,351
  • 109
  • 780
  • 844
spacemilkman
  • 943
  • 1
  • 9
  • 15

3 Answers3

1

As BenM said, don't re-invent Google. But, if you are decided, here are some points:

  1. file_get_contents() is a proper way to fetch content of URL, you can create additional context options using stream_context_create() to avoid indexing 404 or 500 links and to follow permanent redirections

  2. when you get your code, you must parse it, take out unnecessairy parts and get text content. Here is very informative questions about that specific topic

  3. you must think of charset of incoming content. You will very easely fall into trouble even if you respect declared character sets of source. Here is another informative link about that topic

  4. At last, your data will end up in database in safe manner only if you follow recomandations about data escaping using built-in escaping functions and prepared statements (see about PDO and Mysqli, do not use old mysql interface). If you miss any of these, you are responsible for mess.

hope this helps you.

Community
  • 1
  • 1
Miloš Đakonović
  • 3,751
  • 5
  • 35
  • 55
  • Thanks a lot for your informative answer. I am using PDO already, is it true that I don't need to do escaping any more? I am thinking if there is functions like strip_tags() that are helpful here. – spacemilkman Mar 10 '13 at 14:24
  • No, to what I know, you have to escape string, since prepared statements does not escape anything. You need both. My practice is to use strip_tags as very, very last thing before inserting useful content into database. And it is not mandatory if you fetch text content using, for example Simple_HTML_DOM Parser. I gave you a useful link under #2, about parsing HTML DOM. – Miloš Đakonović Mar 10 '13 at 14:33
0

You'll primarily need to be wary of MySQL injection hacks. To avoid these, use PHP's PDO extension and prepared statements. Take a look at PHP's documentation: http://php.net/manual/en/pdo.prepared-statements.php

Wes Cossick
  • 2,923
  • 2
  • 20
  • 35
  • 1
    That's more a comment, than an answer. – BenM Mar 10 '13 at 13:59
  • I disagree, this is (at least part of) the answer to the question about issues you get when inserting random contents into you database. Be it directly user-submitted, or via `get_contents` – Nanne Mar 10 '13 at 14:01
0

there is no need to use file_get_contents method because you don't need to store the text first in the file and then retrive it again.

the best suitable would be direct insertion into the database. (MySql).
PDO would be safest in this regard. . .

split the text first before saving to database for better indexing

adeel iqbal
  • 494
  • 5
  • 23