I have temporarily moved to Berkeley, California, to serve as a “Science Communicator in Residence” at the Simons Institute, the world's leading laboratory for collaborative research in theoretical computer science.
One nano collaboration is today's puzzle. That's what I was told by a computer scientist at Microsoft who I befriended over tea. We're talking about data centers, warehouses full of computers that store all of our data.
One of the problems faced by data centers is the unreliability of physical machines. Hard drives fail all the time, and when they fail, all your data can be lost. How can a company like Microsoft ensure that data can be recovered from a failed hard drive? The solutions to the puzzles below are essentially the answer to this question.
An obvious strategy that data centers can use to protect machines from random failures is to have replicas on every machine. In this case, if your hard drive fails, you will recover your data from the clone. However, this strategy is highly inefficient and is therefore not used. If you have 100 machines, you will need 100 more replicas. As you can guess, there is a better way.
the missing boxes
There are 100 boxes. Each box contains one number, and no two boxes have the same number.
1. you are told that One Boxes will be removed randomly.But before it is removed you are given additional box, can contain a single number. What number would you put in an additional box that would ensure that you would recover that number no matter which box you removed?
2. you are told that two Boxes will be removed randomly.But before it is removed you are given 2 additional boxes, Each can contain one number. What (different) numbers would you put in these two boxes to ensure that you can recover the numbers for both deleted boxes?
We'll give you the answer at 5pm UK. No spoilers allowed, so feel free to discuss your favorite hard drives.
The analogy here is that each box is a hard drive, the numbers inside the box are the data, and removing the box will cause the hard drive to fail. Adding one hard drive makes you safe against the accidental failure of one hard drive, and adding two makes you safe against the failure of two hard drives. It's magical that so much information can be protected from accidental failure with minimal backups.
The field of “error correction codes” is full of beautiful theories that provide answers to questions such as how to minimize the number of machines needed to protect against accidental failure of hard drives. And the theory works! In a data center, no data is lost due to mechanical failure.
My tea partner was Mr. Sivakant Gopi, a principal scientist at Microsoft. He said: “The magic of error-correcting codes allows us to build reliable systems using noisy and defective components. Thanks to them, we can communicate with someone far away, to the ends of our solar system, and Billions of terabytes of data can be safely stored in the cloud. We can forget about the hustle and bustle of this world and enjoy its beauty instead.”
Since 2015, I have been posting puzzles here every other Monday. Always looking for great puzzles. If you would like to suggest anything, please email me.