Saturday, January 10, 2009

Data synchronization via an untrusted third-party


This is my first blog post ever, so... let's start softly :-)

Imagine you need to keep data sync'd between different locations but you have nowhere safe to centralize it: how would you do?

It's actually pretty simple, well, at least in theory. The straightforward answer is of course cryptography, but the tricky part is getting the synchronization software to work with an encrypted repository (in addition to encrypting the connection) .

In the case of pure data synchronization, this can be achieved using EncFS and Rsync. Create an encrypted directory using the former, work in the (cleartext) mount point, and synchronize the encrypted directory with the repository using the latter.

When it comes to complex data synchronization, such as version control (e.g. CVS), the support for an encrypted repository has to be intrinsic to the synchronization software. I don't know of any version control system capable of that at the moment... which calls for contribution! :-D Which one shall I choose first: CVS, Subversion, Git or OpenCVS? I'll let you know ;-)


  1. What's wrong with just keeping the repository in a EncFS filesystem?

  2. Hey Nicolas! Small world isn't it? ;)

    "What's wrong with just keeping the repository in a EncFS filesystem" on the central server, or on each desktop and have all those encfs' synchronized via rsync?

    Well, the former would imply mounting the encfs on the repository server and thus exposing the cleartext data to anyone else getting access to the server (precisely the said "untrusted third-party": think hosting company admins, insufficiently compartmentalized colocation centers & hackers) which is what I was trying to avoid.

    The latter is doable: create an encfs, setup a local repository inside the encfs cleartext tree, checkout/commit/update from your working directory, and finally rsync the underlying encfs encrypted tree with the server.

    I've been using that for quite a while now, it works, but I believe that scaling this hack to more than one person would induce a risk of data corruption (concurrent rsync's can't be good) besides failing at fine-grained Software Configuration Management.