0

I'm interested how does google docs store documents on server side because I need to create similar application.

Does it use pure RTF/ODF files or own database? How do they make possible versioning and undo/redo feature?

If anybody have knowing according this question please share with me.

Erik
  • 14,060
  • 49
  • 132
  • 218
  • 1
    I'd guess Google aren't open with this info, and they want to keep their tech a business secret to avoid competitors from implementing the same thing. – MW. May 27 '14 at 08:43
  • Ok. but are there any like applications that share their approach? – Erik May 27 '14 at 08:47
  • If your users like Google Docs, a server can request them to authenticate with Google so it can create documents on their behalf. Even if you don't want to do that, skimming the google doc list API documentation might be enlightening: https://developers.google.com/google-apps/documents-list/?csw=1 – Paul May 27 '14 at 08:49
  • Thanks for the answer but I'm interested in own service – Erik May 27 '14 at 08:53
  • HTML is probably better for versioning than RTF/ODT. It's also easier to read and write on a webpage. – Sebastien C. May 30 '14 at 08:56
  • does it require collaborative editing? And have you looked at http://etherpad.org/, which is open source? It's like a stripped down version of google docs. The very least you can look at how they store state. Also I remember reading this question sometime ago though I didn't really read it that much as it was a bit deep http://stackoverflow.com/q/2043165/1480215 – mfirdaus May 30 '14 at 13:00
  • Yes. I need collaborative editing also – Erik May 30 '14 at 13:07
  • google has their own very powerful operation transform that can merge in all sorts of edge cases like intermittent connectivity, incorrect clocks, simultaneous edits, and more. the guy who come up with it as a doctoral project got scooped up by google right away. You can get several less-robust but still great OTs for formatted text that are free and open source, but don't expect to just drop it into an existing project and walk away: these things need heavy integration into the low-level parts of an application like an editor. – dandavis Jun 02 '14 at 21:46
  • google does not use html or rtf for storing docs, they use thier own format. they don't even use contentEditable, they re-invented the wheel from scratch with their diff/merge routines fully integrated into low-level dom events that make it look like your're typing and selecting.... – dandavis Jun 02 '14 at 21:49
  • Please look at the PHP application at owncloud.org. They offer what you need and provide the source code for you to study. – crafter Jun 04 '14 at 06:29

3 Answers3

7

To answer you question specifically to how Google Docs works. They use a technology called

Operational Transformation

You may be able to use one of operational transformation engines listed on: https://en.wikipedia.org/wiki/Operational_transform#OT_software

The basic idea is that every operation has a context, e.g. "delete the fourth word in the fifth paragraph" or "add an input box after the button". The clients all send each other operations thru the server. The clients and server each keep their own version of the document and apply operations as they come.

When operations have overlapping contexts, there are a bunch of rules that kick in to resolve conflicts. Like you can't modify something that's been deleted, so the delete must come last in a sequence of concurrent operations on that context.

It's possible that the various clients and server will get out of sync, so you need a secondary algorithm to maintain consistency. One way would be to reload the data from the server whenever a conflict is detected.

--This is an answer I got from a professor when I asked the same thing a couple of years ago.

Ray
  • 2,713
  • 3
  • 29
  • 61
0

You should use a database. Perhaps a table storing each document revision. First, find a way to determine whether an update is significant or not. You can store minor changes client side for redo/undo, and then, either periodically or per some condition (e.g., user hits save), create a database entry per revision (you can store things like bytes changed, bytes added, bytes deleted, etc.).

Take a look at MediaWiki, which is open source, and essentially does what you're asking (i.e., take a look at their tables and code).

RTF/ODF would typically be generated, and served, when a user requests exporting the document.

  • Are you sure I need to store each document revision in separate table? Maybe there is any opensource databases that allow to make git-like features? – Erik May 30 '14 at 11:23
  • That wasn't so clear (insomnia :)) For each document, you could store a table of revisions. Or you could do this in one table, etc. A lot of factors would go into the specifics. – evanlikesstuff Jun 02 '14 at 10:06
  • I'm really looking for a solution how could I store JSON based documents with undo/redo and versioning feature. – Erik Jun 02 '14 at 10:09
  • I don't understand what you mean by JSON based documents. You can generate JSON from something like SQL, or store it statically in a file. And then you can store revisions along the lines of http://stackoverflow.com/questions/3541383/undo-redo-implementation – evanlikesstuff Jun 02 '14 at 10:14
  • I use MongoDB as data storage and it works fine for me. At now I need to make undo/redo and revisioning features. I can do drop MongoDB if I find better solution for my purpose. – Erik Jun 02 '14 at 10:21
  • you can get versioning by simply never deleting or updating the data: always append the new version on save and always grab the newest copy on load in your app. later, you can do a different search for all copies instead of just the most recent to get a list of revisions and edit dates. – dandavis Jun 02 '14 at 21:43
0

Possibly, you should consider utilizing Google Drive's public API. See link for details.

Zorayr
  • 23,770
  • 8
  • 136
  • 129
Ricardo Fiorani
  • 803
  • 1
  • 7
  • 19