2

I'm creating my first web application - a really simplistic online text editor. What I need to do is find the best way to store text based files - a lot of them. These text files can be past 10,000 words in size (text words not computer words.) in essence I want the text documents to be limitless in size.

I was thinking about storing the text files in my MySQL database - but thought there was a better way. Instead I'm planing on storing the text files in XML based format in a directory on my server. The rows in the database define the name of the xml based text file and the user who created the text along with basic metadata. An ID is generated using a V4 GUID generator , which gives the text an id and stores the text in the "/store" directory on my server. The text definitions in my server contain this id, and the android app I'm developing gets the contents of the text file by retrieving the text definition and then downloading the text to the local device using the GUID in the text definition.

I just think this is a botch job? how can I improve this system?

There has been cases of GUID colliding. I don't want this to happen. A "slim" possibility isn't good enough - I need to make sure there is absolutely no chance in a GUID collision. I was planning on checking the database for texts that have the same id before storing the text with a particular id - I however believe with over 20,000 pieces of text in my database this would take an long time and produce unneeded stress on the server.

How can I make GUID safe?

What happens when a GUID collides?

The server backend is going to be written in PHP.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
  • 2
    What is wrong with a MySQL database? I see people worried about 10k, 20k records in a database, but we work with tables which have millions of rows on a daily basis. It's not uncommon, and it's not a problem. They were designed to accommodate a lot of data. – NullUserException Oct 22 '12 at 22:49
  • 2
    And they were also designed to *find* data very quickly. I highly doubt you'll be able to beat a query on an indexed column with something like XML, flatfiles or whatever. What's worse, flat-files and XML scale very, very badly. Once you go past a certain size, things slow down to a halt. Writing anything to a substantially large file is a nightmare. Trust me, you *don't* want to go down that path. You've been warned. – NullUserException Oct 22 '12 at 22:56
  • I seem to under estimate the power of servers and RAM storage.I'll give it a go and see how well the system copes which storing such huge files. – Nathaniel Bennett Oct 22 '12 at 23:21

3 Answers3

4

You've got several questions here, so I'll try to answer them all.

Is XML with GUID the best way to do this?

"Best" is usually subjective. This is certainly one way to do it, but you're probably adding unneeded overhead. If it's just text you're storing, why not put it in the SQL with varchar(MAX)?

Are GUID collisions possible?

Yes, but the chance of that happening is small. Ridiculously small. There are much bigger things to worry about.

How can I make GUIDs safe?

Stop worrying about them.

What happens when a GUID collides?

This depends on how you're using them. In this case, the old data stored in the location indicated by the GUID would probably be overwritten by the new data.

Community
  • 1
  • 1
SomeKittens
  • 38,868
  • 19
  • 114
  • 143
  • it just seems weird to store such a huge amount of data on a table cell? I mean , the data doesn't exist on a file - it's potentially virtual. the idea with XML is that the user would download the file to there local disk , modify it and uploaded it back to the server automatically. – Nathaniel Bennett Oct 22 '12 at 23:17
  • I'll give MySQL holding the data a go - it's not a huge amount of code. – Nathaniel Bennett Oct 22 '12 at 23:18
  • MySQL tables are stored 'on disk' in files, it's just that they don't appear as individual files in the file system. The benefits far outweigh the disadvantages. Glad to hear you're going that route! Ping us back if you hit any further issues. – nickhar Oct 22 '12 at 23:20
2

Well i dont know if id use a guid i would probably just use the auto_increment key on the db table and name the files like that because unless you have deleted records from the db without cleaning up the filesystem they will always be unique. I dont know if the GUID is a requirement on the android side though.

prodigitalson
  • 60,050
  • 10
  • 100
  • 114
2

There's nothing wrong with using MySQL to store the documents!

What is storing them in XML going to provide you with? Adding an additional format layer will only increase the processing time when they are to be read and formatted.

Placing them as files on disk would be no different than storing them in an RDBMS and in the longer-term probably cause you further issues down the line. (File access, disk-seek, locking, race conditions come to mind).

nickhar
  • 19,981
  • 12
  • 60
  • 73