Cloud Deduplicaton

Here’s a thought experiment.
Consider the problem of online storage of music libraries.  There are various free sites that do this, and Apple is rumored to be planning cloud storage for users’ iTunes libraries, making it possible for a user to stream his or her music from anywhere.
It is fairly obvious that there is no need to store every copy of a song separately.  In the enterprise market, the idea of storing only one copy of each file, and then keeping track of who has a copy is called “deduplication”.
As an aside, this can work at the block level rather than the file level, and it can work even when common blocks of data in files are misaligned, due to some fairly cool technology called rolling hashes.
Now the upload phase of storing music into the cloud cannot work by just artist and title, because there may be many performances of a work that were separately recorded and are distinct.  Classical music fans are especially devoted to particular recordings of their favorites.  Consequently, the upload is likely to be accomplished by sending the hash of your local file, and the cloud server immediately says “yup, got that one!”  The user says, “Man, that upload was fast!”
Now it seems entirely possible to game the system, by sharing, not music, but <hashes of music>.  I go to the music hash warez site, and grab a set of hashes, and then I use my slightly hacked music uploader to say “Here are the hashes of the music I want to upload” and the cloud server says “yup, got those!”
Then, later, I can stream all the music, or reload my local copy after my local library is “accidently lost”.
So is it illegal to share hashes of music? Is the hash copyrighted?  Is this a bug in cloud deduplication? It would be a shame to require users to upload all the bits, just so the hashes can be computed in a trusted environment.
One possible solution is to compute keyed hashes. For example, the cloud server says “compute your music hashes using this personally customized algorithm”.  Of course, then the user can merely forward the instructions to a friend who <does> have the music.  Is it a crime to compute a hash function for someone?
-L

Leave a Reply

Your email address will not be published. Required fields are marked *