Contact Us

Subscribe via Email

Your email:

New England Data Services Blog

Current Articles | RSS Feed RSS Feed

Compressed Files: How They Work

  | Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon | Submit to Reddit reddit 

Have you ever downloaded a file from a fellow co-workers email with a .ZIP extension? When you open the file up, WinZIP takes over and creates a folder with a collection of text files that he has been working on. You also notice that the size of the actual .ZIP file was smaller than the files that came out of it. The laws of physics cannot explain how a large object can fit in a box physically smaller than itself. The trick to doing this is through file compression.

 
Here I will take a quick phrase and compress it. 

“Try not to become a man of success but rather to become a man of value”

 This phrase contains 16 words, 55 letters, and 15 spaces. All together we will say this phrase takes up 70 (characters and spaces together) units of memory. We see some redundant words that we can create a common value for. Using the following key, we can assign some words to our common value. 

                          1 - to     
                          2 - become  
                          3 - a      
                          4 - man       
                          5 - of

 Now our sentence reads:

            “Try not 1 2 3 4 5 success but rather 1 2 3 4 5 value”

 When we assign the common values to certain words, we have come out with fewer characters. There are still a total of 16 words and 15 spaces, but now we only have 37 characters. Together we can save the new phrase in 52 units of memory. If you knew the system we used to compress the original phrase, you could easily translate our compressed sentence into the original phrase. Essentially, this is what your compression software, WinZIP, does when you double click on a .ZIP file.

 

Searching for even more patterns we can see the string “1 2 3 4 5” appears twice. Assigning those both to the number 6 eliminates having to double save “1 2 3 4 5” into memory locations.

 

            “Try not 6 success but rather 6 value”

 

Now we have an even more compressed sentence with 8 words, 29 characters, and 7 spaces, making 36 units in total. To recap it all, we have taken the original phrase with 70 unites of memory and compressed it down into 36 units of memory. The computer now takes the 36 new memory units and the compression algorithm and zips it all up into a .ZIP file.


Even though I saw patterns in words with this sentence, patterns of characters can also be put together and assigned to a common value. For example “cc” or “at” could appear in the text after or before this phrase within the file. There is countless number of ways that your software can take to find redundancy in a text file. This works excellent with text files containing large character strings. The larger the file size, the greater chance of redundancy. File types, like video and audio files cannot be compressed because there is little redundancy within the file type.

 

Our above example shows the type of compression called lossless compression. With this type of compression, the file you compressed will be the same size as the file you get when you decompress it. It is like physically taking a big object of X volume, breaking it down into pieces and squeezing it all into a box of smaller volume with the instructions on how to piece it together. When you open the box you read the instructions and rebuild the object. The other type of compression, lossy compression, is very different. When breaking down the same object, it is like putting the pieces that fit in the box and throwing out the extra. With fewer pieces than you started with, you cannot rebuild the object that you started with. Applying this example to audio and image files, lossy compression results in quality lost. An image of grass with a lot of color and texture may be compressed into a solid green image.

 
After reading this blog, you will never look at a compressed file the same way. We can now see how useful compression is when sending large files through an email document or file sharing website. I hope you don’t go hacking that $40,000 vase apart when you have to send cross country it in a physically smaller box.

All Posts