^{n}different bit strings of n bits. On the other hand, there are at most 2

^{n}-1 different bit strings of n-1 or less bits. The pigeonhole principle implies it is not possible to compress

*all*n bits strings to a smaller number of bits.

Based on these arguments, Mike Goldman proposed the following challenge (excerpted from comp.compression faq):

Mike Goldman <whig@by.net> makes another offer: I will attach a prize of $5,000 to anyone who successfully meets this challenge. First, the contestant will tell me HOW LONG of a data file to generate. Second, I will generate the data file, and send it to the contestant. Last, the contestant will send me a decompressor and a compressed file, which will together total in size less than the original data file, and which will be able to restore the compressed file to the original state. With this offer, you can tune your algorithm to my data. You tell me the parameters of size in advance. All I get to do is arrange the bits within my file according to the dictates of my whim. As a processing fee, I will require an advance deposit of $100 from any contestant. This deposit is 100% refundable if you meet the challenge.

This appears to be an easier challenge since the compression and decompression algorithm can be tuned to the data generated. Incidentally, in the following exchange, an attempt is made to win the challenge by exploiting how data is stored in a computer disk system and using the filename as additional storage by splitting the data into many files.

There are several aspects that makes Mike Goldman's challenge possibly solvable and not subject to the constraints of the pigeonhole principle. First of all, the compression algorithm can be tuned to the data. In other words, the compression algorithm only needs to compress only one single file. It doesn't need to compress all files and thus the pigeonhole principle does not apply. Secondly, the challenge did not prescribe what computer language the decompressor can be written in. Can it be written in C, Perl, Python or Matlab? Using a high level language essentially provides the decompressor with complex subroutines that are callable with simple calls, and thus some of the compression can be "hidden" in this way.

## No comments:

## Post a Comment