0

I need to parse Google Images for a specified keyword(s) and retrieve image links, which I can do with PHP simple dom parser, and can retrieve a few image links per call. Google Images API is limited to 100 calls right now. Now, I need to save those image links for a "keyword" so that next time when I need images for that keyword, my script first looks for whether keywords and its related images (i.e. URL's) are already stored in my system and whether there is any need to call Google.

What would the most efficient way to store that keyword and images (i.e. db or simple text files)?

If it is mySQL then what does its schema look like?

Can I store image links in a text file where the file name is keyword?

MrWhite
  • 43,179
  • 8
  • 60
  • 84
user3668629
  • 113
  • 7

2 Answers2

0

I would suggest using mysql, choosing it gives u flexibility, speed and easy access to your data. Just put info about your images in one table, something like:

id | name | keyword | path | creTime | size | ext ( and any other that u would need )

Then you can just pull any number of images by Keyword, like "water", "views", or something.

I would probably make two tables of it, changing keyword for keywordId. Then create foreign key to Keyword_data. Now you have relation One ( keywordId ) from Images_data to Many ( id ) keyword_data.

id | keyword | description | ( anything else)

That way you can have multiple keywords for any image group, as it would be more flexible.

fsn
  • 549
  • 3
  • 11
  • Both of your posts are helpful. I have decided to use MySql. Thank you for your inputs. – user3668629 Jul 30 '15 at 20:01
  • @user have fun projecting and coding. It always good to use some workebnes to create table, to see all more clearly ;) – fsn Jul 30 '15 at 20:13
0

I usually like working with relational databases because you can do more with them as you grow. I would also add timestamps to your data (as shown below) so you can know when your links were cached (cause that's effectively what you are doing). Here is the schema I would use:

Images Table

id - Integer, primary index, autoincrement
keyword - Varchar(50), indexed
url - Varchar(2083)
created_at - Timestamp
updated_at - Timestamp
Any other data you want to store, like image type, size, etc.

The length of the URL was based off of this post.

If you wanted to normalize your data even further, you could do this:

Images Table

id - Integer, primary index, autoincrement
keyword_id - integer, indexed
url - Varchar(2083)
created_at - Timestamp
updated_at - Timestamp
Any other data you want to store, like image type, size, etc.

Keywords

id - Integer, primary index, autoincrement
word - Varchar(50)
created_at - Timestamp
updated_at Timestamp
Any other data you want to store

Personally, I would probably just go with the first option because it's simpler (depending on how big your data will get).

Community
  • 1
  • 1
JasonJensenDev
  • 2,377
  • 21
  • 30