Code (information theory)


August 19, 2022

Code, in information theory, and specifically in telecommunications, electronics and computer science, is a system of signals, signs or symbols conventionally designated to represent information.


The term code is used with two meanings: coding procedure, concerns the modality followed to univocally assign to each element of the set to be represented a string that represents it. In this sense, the code is the predetermined form that a message assumes when it is transmitted; set of encodings, denotes the set of representative strings (this is the meaning used in the branch of mathematics called code theory (v. 94-XX). A code is said to be efficient when it uses a number of symbols strictly necessary to encode the information, while on the contrary it is said to be redundant when it uses an abundant number of symbols, and therefore more than necessary, but useful for simplifying the generation and interpretation of information.

Formal definition

Let S be a finite set of elements called the code alphabet, such as the two faces with a coin (T, C). A set A of sequences constructed by juxtaposing one or more elements of S is a code. Each element of A is a code word, and the number of elements of the alphabet used to construct it indicates its length. For a code to be useful and meaningful, however, it must be associated with some controllable mechanism (formula, algorithm, well-defined list, ...) with a set of possible data that must faithfully represent and therefore have the same cardinality. For example, the set {T, C, TC, TT} is a code and can be used as an encoding of the numbers 0, 1, 2, 3.


Juxtaposing several code words results in a message built on this code, such as TTC or TCTC. Regardless of whether any message can be decomposed so that there is a single set of code words that compose it, the code is said to be uniquely decodable or not. The above code is not uniquely decodable since the TT message could be decomposed as the repetition 2 times of the T word or the TT code word itself. On the contrary, {C, TC, TTC, TTTC} is a uniquely decodable code. A code in which all the words have the same length is called block code or otherwise variable-length code. Other properties of a code are the ability to correct errors, compress messages, be linear or not, be usable in cryptography or be instant. The study of codes in a systematic way as fundamental elements for information and transmission theory was born in 1948 with the work of Claude Shannon.


A traditional example of a code is Morse, belonging to the character encoding and used in the early days of telegraphy (1840): in it, each letter of the Latin alphabet (the set of information to be represented) is assigned a sequence of points and lines (the elements of the alphabet used for coding). Other examples of encoding are digital encoding of an analog signal or analog-to-digital conversion, source encoding and channel encoding.


The uniqueness of the representation plays a crucial role in all applications of coding (the process of transporting the elements from the initial representation to that defined by the code) and of decoding (the reverse). Codes are useful when normal verbal communication is not sufficient or not practicable. With an appropriate coding it is possible to describe much more complex realities than the lexicon of natural language, such as an image or a series of sounds. With the advent of information technology and telecommunications, the codes have taken ul