Изменения

Алгоритмы LZ77 и LZ78

1783 байта убрано, 05:28, 29 октября 2010

Нет описания правки

Это означает, что каждый символ можно однозначно определить, переместившись назад в буфере — даже если данный символ еще отсутствует в буфере на момент декодирования текущей пары ''длина-смещение''.

=== Описания алгоритма===

LZ77 использует "скользящее" по сообщению окно. Допустим на текущей итерации окно зафиксировано. С правой стороны окна наращиваем подстроку пока она есть в строке скользящее окно+наращиваемая строка и начинается в скользящем окне. Назовем наращиваемую строку буфером. После наращивания алгоритм выдает код состоящий из трех элементов:

* смещение в окне;

* длина ~~подстроки~~буфера;* ~~первый~~ последний символ буфера~~, следующий за подстрокой~~.В конце итерации алгоритм сдвигает окно на длину ~~подстроки~~равную длине буфера+1.=== Пример "~~kabababk~~kabababababz" ===

{| border="1"

!~~Содержымое~~ Содержимое окна(длина равна 4) !~~Наращиваемая строка~~Содержимое буфера

!КОД

|-

|'.....'( |~~Ячейка 2*2~~'k' |~~Ячейка 3*2~~<0,0,'k'>

|-

|~~Ячейка 1*3~~'....'( |~~Ячейка~~ 'a' |<0,0,'a'> |- |'...a' |'b' |<0,0,'b'> |- |'..ab' |'aba' |<2*3,2,a> |~~Ячейка 3*3~~- |'baba' |'bababz' |<2,5,'z'>

|}

~~Поз. Длина Симв.~~

~~''mississippi '' 0 0 a~~

~~a ''bracadabra'' 0 0 b~~

~~ab ''racadabra'' 0 0 r~~

~~'''a'''br ''acadabra'' 3 1 c~~

~~abr'''a'''c ''adabra'' 2 1 d~~

~~'''abra'''cad ''abra'' 7 4 abra~~

~~{{скрытый текст}}~~

~~<!--~~

~~Статья еще не содержит этой информации:~~

Even though all LZ77 algorithms work by definition on the same basic principle, they can vary widely in how they output their encoded data — what values are possible for lengths and distances, for example, and how length-distance pairs are distinguished from ''literals'' (single characters encoded as themselves, rather than as part of a length-distance pair.) A few examples:

* The algorithm illustrated in Lempel and Ziv’s original 1977 paper output all its data three values at a time: the length and distance of the longest match found in the buffer, and the literal which followed that match. If two successive characters in the input stream could only be encoded as literals, the length would be 0.

* In the [[PalmDoc]] format, a length-distance pair is always encoded by a two-byte sequence. Of the 16 bits that make up these two bytes, 11 bits go to encoding the distance, 3 go to encoding the length, and the remaining two are used to make sure the decoder can identify the first byte as the beginning of such a two-byte sequence.

* [[As of 2004]], the most popular LZ77 based compression method is called [[DEFLATE (algorithm)|DEFLATE]]; it combines LZ77 with [[Huffman coding]]. Literals, lengths, and a symbol to indicate the end of the current block of data are all placed together into one alphabet. Distances can be safely placed into a separate alphabet; since a distance only occurs just after a length, it cannot be mistaken for another kind of symbol or vice-versa.

~~-->~~

== LZ78 ==

Chavit

96

правок

Изменения

Алгоритмы LZ77 и LZ78

Навигация

Персональные инструменты

Пространства имён

Варианты

Просмотры

Ещё

Поиск

Навигация

Инструменты