Community Sites
Create your own community website and start earning today !
It's Free !
 
Communities Members BookmarksPolls Fresher Jobs Funny Photos B.Tech Projects New Member FAQ  



My Profile
Active Members
TodayLast 7 Days more...



Awards & Gifts
Online Exams

Fresher Jobs


Our fresher job section is exclusively for fresh graduates! Find jobs for freshers in major Indian cities including Bangalore, Chennai, Hyderabad, Pune or Kochi

Resources


Find educational articles, blogs, discussion threads and other resources.

Colleges


Find details about any college in India or search for courses.

website counter




DATA COMPRESSION TECHNIQUES


Posted Date: 20 Jan 2008    Resource Type: Articles/Knowledge Sharing    Category: Computer & Technology

Posted By: syam s kurup       Member Level: Silver
Rating:     Points: 5



Data compression is the process of converting an input data stream or the source stream or the original raw data into another data stream that has a smaller size.

BASIC TYPES OF DATA COMPRESSION
There are two basic types of data compression.
1. Lossy compression
2. Lossless compression

LOSSY COMPRESSION
In lossy compression some information is lost during the processing, where the image data is stored into important and unimportant data. The system then discards the unimportant data
It provides much higher compression rates but there will be some loss of information compared to the original source file. The main advantage is that the loss cannot be visible to eye or it is visually lossless. Visually lossless compression is based on knowledge about colour images and human perception.

LOSSLESS COMPRESSION
In this type of compression no information is lost during the compression and the decompression process. Here the reconstructed image is mathematically and visually identical to the original one. It achieves only about a 2:1 compression ratio. This type of compression technique looks for patterns in strings of bits and then expresses them more concisely.


TECHNIQUES OF DATA COMPRESSION
There are three important techniques of data compression.
1) basic technique
2) statistical technique
3) dictionary method

BASIC TECHNIQUES
These are the techniques, which have been used only in the past. The important basic techniques are run length encoding and move to front encoding.

STATISTICAL TECHNIQUES
They are based on the statistical model of the data. Under this statistical techniques there comes three important techniques
• Shannon Fano coding
• Huffman coding
• Arithmetic coding

DICTIONARY METHODS
This method select strings of symbols and encodes each string as a token using a dictionary. The important dictionary methods are
• LZ77 (sliding window)
• LZRW1

BASIC TECHNIQUES
1
RUN LENGTH ENCODING

The basic idea behind this approach to data compression is this: if a data item occurs n consecutive times in the input stream replace the n occurences with a single pair . the n consecutive occurences of a data item are called run length of n and this approach is called run length encoding or RLE.

RLE IMAGE COMPRESSION
RLE is a natural candidate for compressing graphical data. A digital image consists of small dots called pixels. Each pixel can be either one bit indicating a black or white dot or several bits indicating one of several colours or shades of gray. We assume that this pixels are stored in an array called bitmap in the memory. Pixels are normally arranged in the bit map in scan lines. So the first bit map pixel is the dot at the top left corner of the image and the last pixel is the one at the bottom right corner. Compressing an image using RLE is based on the observation that if we select a pixel in the image at random there is a good chance that its neighbours will have the same colour. The compressor thus scans the bit map row by row looking for runs of pixels of same colour.
Consider the grayscale bitmap -
12,12,12, 12, 12, 12, 12, 12, 12, 35,76,112,67,87,8787,5, 5, 5, 5, 5, 5,1- - - - - -
Compressed Form --

9, 12, 35, 76, 112, 67, 3, 87, 6, 5, 1- - - - - - - - - - -

MOVE TO FRONT CODING

The basic idea of this method is to maintain the alphabet A of symbols as a list where frequently occuring symbols are located near the front. A symbol 'a' is encoded as the no of symbols that precede it in this list. Thus if A=('t','h','e','s') and the next symbol in the input stream to be encoded is 'e', it will be encoded as '2' since it is preceded by two symbols. The next step is that after encoding 'e' the alphabet is modified to A=('e','t','h','s') . This move to front step reflects the hope that once 'e' has been read from the input stream it will read many more times and will at least for a while be a common symbol.
Let A = (“t”, “h”, “e”, “s” )
After encoding the symbol “e”, A is modified.
Modified Form:-
A = (“e”, “t”, “h”, “s” )

ADVANTAGE
This method is locally adaptive since it adapts itself to the frequencies of the symbol in the local areas of input stream. This method produces good results if the input stream satisfies this hope that is if the local frequency of symbols changes significantly from area to area in the input stream.


STATISTICAL TECHNIQUES

SHANNON FANO CODING
Shannon fano coding was the first method developed for finding good variable size codes. We start with a set of n symbols with known probabilities of occurences. The symbols are first arranged in the descending order of the probabilities. The set of symbols is then divided into two subsets that have the same probabilities. All symbols of one subset are assigned codes that start with a zero while the codes of the symbols in the other subset start with a one. Each subset is then recursively divided into two. The second bit of all codes is determined in a similar way. When a subset contains just two symbols their codes are distinguished by adding one more bit to each. The process continues until no subset remains.

HUFFMAN CODING

A commonly used method for data compression is huffman coding. The method starts by building a list of all the alphabet symbols in descending order of their probabilities. It then constructs a tree with a symbol at every leaf from the bottom up. This is done in steps where at each step the two symbols with smallest probabilities are selected, added to the top of partial tree, deleted from the list and replaced with an auxiliary symbol representing both of them. When the list is reduced to just one auxiliary symbol the tree is complete. The tree is then traversed to determine the codes of the symbols.
The huffman method is somewhat similar to shannon fano method. The main difference between the two methods is that shannon fano constructs its codes from top to bottom while huffman constructs a code tree from bottom up.


ARITHMETIC CODING

In this method the input stream is read symbol by symbol and appends more to the code each time a symbol is input and processed. To understand this method it is useful to imagine the resulting code as a number in the range [0,1) that is the range of real numbers from 0 to 1 not including one. The first step is to calculate or at least to estimate the frequency of occurrence of each symbol.
The foresaid techniques that is the Huffman and the Shannon Fano techniques rarely produces the best variable size code. The arithmetic coding overcomes this problem by assigning one code to the entire input stream instead of assigning codes to the individual bits.


DICTIONARY METHODS
Dictionary methods select strings of symbols and encode each string as a token using a dictionary. The dictionary holds all the strings of symbols.

LZ77 (SLIDING WINDOW)

The main idea of this method is to use part of previously seen input stream as the dictionary. The encoder maintains a window to the input stream and shifts the input in that window from right to left as strings of symbols are being encoded. The method is thus based on "sliding window". The window is divided into two parts that is search buffer ,which is the current dictionary and lookahead buffer ,containing text yet to be encoded.

LZRW1 TECHNIQUE

The main idea is to find a match in one step using a hash table. The method uses the entire available memory as a buffer and encodes the input string in blocks. A block is read into the buffer and is completely encoded. Then the next block is read and encoded and so on. These two buffers slide along the input block in memory from left to right. It is necessary to maintain only one pointer pointing to the start of look ahead buffer. This pointer is initialized to one and is incremented after each phrase is encoded. The leftmost three characters of the look ahead buffer are hashed into a 12 bit number 'I', which is used to index an array of 212 ie 4096 pointers. A pointer p is retrieved and is immediately replaced in the array by 'I'. if p points outside the search buffer there is no match, the first character in the look ahead buffer is output as literal and pointer is advanced by one. the same thing is done if p points outside the search buffer but to a string that does not match with the one in look ahead buffer . If p points to a match of at least three characters the encoder finds the longest match, outputs a match item and advances the pointer by the length of the match.


Data compression is a developing field. It has become so popular that researches are going on in this field to improve the compression rate and speed. Modifying an algorithm by 1% will improve the right time by 10%. Of course the aim of data compression branch is to develop better and better compression techniques.






Responses


No responses found. Be the first to respond and make money from revenue sharing program.

Feedbacks      
Popular Tags   What are tags ?   Search Tags  
(No tags found.)

Post Feedback


This is a strictly moderated forum. Only approved messages will appear in the site. Please use 'Spell Check' in Google toolbar before you submit.
You must Sign In to post a response.
Next Resource: we can lock a folder without any software
Previous Resource: MPEG
Return to Discussion Resource Index
Post New Resource
Category: Computer & Technology


Post resources and earn money!
 
Related Resources



Watch TV Channels
  • Watch Asianet TV online
  • Kairali TV in Internet
  • Surya TV online
  • Amritha TV Channel

  • Contact Us    Privacy Policy    Terms Of Use   

    SpiderWorks Technologies Pvt Ltd. 2006 - 2007 All Rights Reserved.