Information Representation
In order to understand what happens inside a computer we must understand how a computer represents information internally. For example, the text you are reading is stored on a computer disk. It was entered into a computer system using a word processing program. Inside the computer’s memory and on the disk this text (and all other information) can be thought of as being represented by a very long sequence of ones and zeros, i.e. in binary form. It appears as text only on a computer’s monitor or on printed output. A computer monitor’s hardware receives binary information and transforms it to the symbols that are displayed. Similarly, a printer’s hardware converts binary information to the text (or graphic) that is printed. Inside the CPU and the computer’s memory, information is actually stored and transmitted in electrical form, while on disk it is stored in magnetic form. A particular electrical signal represents a one and another signal represents a zero. Similarly, a specific magnetic orientation represents a one and another orientation represents a zero. A group of these binary digits (bits) can be considered to form a binary code and information is encoded as a sequence of individual units each of which correspond to a distinct binary code.
Fundamental Principle:
All information is represented by binary codes in a computer system.
A2.1 Classifying Information
From a computing perspective, we can broadly classify the information we wish to store as either numeric or textual. Numeric information refers to information with which we may wish to do arithmetic, i.e. numbers such as 23, 404, 3.14159, 0.96 and so on. Numbers without a decimal point are called integers and those that have a decimal point are called reals. Computer scientists often refer to real numbers as floating-point numbers. It is important to note that different techniques are used to represent integers and reals inside a computer system. Textual information is made up of individual characters (e.g. a,b,c,.., A,B,C.., -, +, ; , ", &, %, #, ’ £, 0,1,2,..9 etc.). Examples include a person’s name, address, phone number, an essay, the information on the page you are reading. The term alphanumeric refers to text containing both alphabetic characters and numbers such as an address, e.g. London W12. Fortunately international standards for information representation have been developed to make life easier for computer manufacturers and users. This means that we can easily transfer information from one make of computer to another. One such standard is the ASCII (American Standard Code for Information Interchange) standard. This standard is used for representing textual information and is described later.
A2.2 Representing Numbers
A2.2.1 Integers
We are so familiar with the decimal number system that we might think that everything that uses numbers would, by default, use the decimal system. However, computers, as we have said, use a number system called binary, which involves only two digits: 0 and 1. In the decimal system, we represent numbers using 10 digits based on powers of 10. For example, the number 2376 may also be written as:
2*103 + 3*102 + 7*101 + 6*100
More formally, we say that the digits in a positive decimal number are weighted by increasing powers of 10. We say that they use the base 10. To illustrate this further, we could write the above number in the following form:
weighting: 103 102 101 100
digits 2 3 7 6
decimal 2376 = 2*103 + 3*102 + 7*101 + 6*100
The leftmost digit, 2 in this example, is called the most significant digit. The rightmost digit, 6 in this example, is called the least significant digit. The digits on the left hand side are called the high-order digits (higher powers of 10) and the digits on the right hand side are called the low-order digits (lower powers of 10). The above system can be used to write numbers in any base m as follows, assuming we are dealing with a 4 digit number:
weighting: m3 m2 m1 m0
digits d3 d2 d1 d0
basem d3d2d1d0= d3*m3 + d2*m2 + d1*m1 + d0*m0
We refer to the digits making up a number by their position, starting from the right hand side with position 0 (i.e. we count from zero). As we have seen earlier, when the base is 10, we replace m by 10. In the binary number system the base is 2 so we can write the 8-bit binary number 0101 1100 as:
weighting: 27 26 25 24 23 22 21 20
bits 0 1 0 1 1 1 0 0
01011100 = 0*27+1*26+0*25+1*24+1*23+1*22+0*21+0*20
= 0 + 6410 + 0 + 1610 + 810 + 410+ 0 + 0
= 9210
The leftmost bit is called the most significant bit (MSB). The leftmost bits in a binary number are referred to as the high-order bits. The rightmost bit in a binary number is called the least significant bit (LSB). The rightmost bits in a binary number are referred to as the low-order bits. In the case of a 16-bit number, we use the above scheme with the powers of 2 ranging from 0 to 15, with a 32-bit number, the powers of 2 range from 0 to 31.
Exercises
A2.1. Convert the following binary numbers to decimal:
(i) 0000 1000 (ii) 0000 1001 (iii) 0000 0111
(iv) 0100 0001 (v) 0111 1111 (vi) 0110 0001
Converting from Decimal to Binary
To convert from any number base to another, you repeatedly divide the number to be converted by the new base and the remainder of the division at each stage becomes a digit in the new base until the result of the division is 0. So to convert decimal 35 to binary we do the following:
Remainder
35 / 2 1
17 / 2 1
8 / 2 0
4 / 2 0
2 / 2 0
1 / 2 1
0
The result is read upwards giving 3510 = 1000112. The number 1000112 is an unsigned binary number. We can convert any positive decimal number to binary using the above method. Signed numbers are represented differently, as described later.
Shortcuts
To convert any decimal number which is a power of 2, to binary, you simply write 1 followed by the number of zeros given by the power of 2. For example, 32 is 25, so we write it as 1 followed by 5 zeros, i.e. 10000; 128 is 27 so we write it as 1 followed by 7 zeros, i.e. 100 0000. Another thing worth remembering is that the largest binary number that can be stored in a given number of bits is made up of all 1’s. An easy way to convert this to decimal, is to note that this value is one less than 2 to the power of the number of bits. For example, if we are using 4-bit numbers, the largest value we can represent is 1111 which is 24-1, i.e. 15; with a 6-bit number the largest value we can represent is 11 1111, the decimal equivalent is 26-1, i.e. 63. The same techniques apply to decimal numbers, e.g. the largest value that a 3 digit decimal number can represent is 999 is 103-1.
Exercises
A2.2. Convert the following decimal numbers to binary, writing them as 8-bit binary numbers. You may pad with 0’s on the left hand side to make up 8 bits when necessary:
(i) 3 (ii) 15 (iii) 16
(iv) 63 (v) 64 (vi) 255
Hexadecimal Numbers
One difficulty with binary numbers is that they tend to be composed of long sequences of bits and so it is easy to err in working with them. Thus, in writing them down we might arrange them in groups of 3 or 4. A 16-bit binary number such as 0110100011001010 could be written as 0110 1000 1100 1010 which is easier to read. However, it is still tedious to work with such numbers. For this reason we use other number systems. Because 10 is not an exact power of 2, it is not as convenient for working with binary numbers as a system using a base which is a power of 2. Two commonly used computer number systems are the hexadecimal (base 16) and octal (base 8) systems. It is easier to convert a binary number to hexadecimal or octal (and vice versa) than it is to convert a binary number to decimal.
With the hexadecimal system, we require 16 distinct digits (representing the numbers 0 to 15). Using the number digits alone only gives us ten digits, so we use letters as well. The decimal numbers 0 to 15 are represented by the hexadecimal digits 0 to 9 and the letters A to F. The hexadecimal digits correspond to their decimal equivalents. The letter A represents 10, B represents 11, C represents 12, D represents 13, E represents 14 and F represents 15. Hexadecimal numbers are weighted in powers of 16, so the number 2FA can be converted to decimal as follows:
weighting: 162 161 160
digits 2 F A
2FA = 2 * 162+ F * 161+ A * 160
= 2 * 162+ 15 * 161+ 10 * 160
= 256 + 240 + 10
= 50610
A hexadecimal digit is represented by 4 bits (since 24 = 16 and we require 16 bit patterns for 16 digits). Thus, all 32-bit numbers can be written as 8-digit hexadecimal numbers, 16-bit binary numbers can be written as 4-digit hexadecimal numbers and 8-bit binary numbers can be represented by 2-digit hexadecimal numbers. This is the major advantage of using hexadecimal numbers, since computers tend to almost always use 32-bit, 16-bit or 8-bit numbers to represent information. So, if we wish to write down the contents of a processor register which can store 16-bits, we can write it as a 4-digit hexadecimal number. If we were to use decimal, we would have to write it as a number between 0 and 65,536 and this is not nearly so convenient or useful.
To convert a binary number to hexadecimal, you break the number into groups of 4 bits from the right hand side. Then you convert each group of 4 bits into its equivalent hexadecimal digit. This process gives you the hexadecimal equivalent of the original binary number. If there are fewer than 4 bits in the leftmost group, you still convert them to their hexadecimal equivalent (pad on the left with 0’s). For example, the 16-bit binary number 0110100011001010 can be divided into the groups: 0110 1000 1100 1010. These are converted to the hexadecimal digits 68CA. Assembly language programmers make extensive use of the hexadecimal numbering system, so it is important to be familiar with it. For example, the letter ‘A’ is represented inside a computer by the binary code 0100 0001 which can be written in hexadecimal as 41. To convert from hexadecimal to binary, we simply convert each hexadecimal digit to its 4-bit equivalent, padding with 0’s on the left, if necessary to make up the 4 bits. For example, hexadecimal digit 4 is written as 0100 in binary, hexadecimal digit 1 is written as 0001 in binary, thus, hexadecimal number 41 is 0100 0001 in binary.
When we write a number, we must indicate its base so that we know which number we are referring to. For example, the number 10 in binary represents two; in decimal it represents ten and in hexadecimal it represents sixteen. There are two common methods for indicating the base, when writing a number in an assembly language programs, one method (used in 8086 programs) is to add a character to the end of the number to indicate the base, e.g. H for hexadecimal, D for decimal and B for binary. Thus, we can easily distinguish between the numbers 10H, 10D and 10B. Lowercase letters may also be used. The second method (used in M68000 programs) is to begin the number with a character which indicates the base. For example, hexadecimal numbers may be written so as to begin with the ‘$’ character and binary numbers written so as to begin with the ‘%’ character. Using this notation, the number $10 represents the hexadecimal number 10 and %10 represents the binary number 10. The absence of a leading character indicates a decimal number in an M68000 program. Table A2.1 displays the numbers 1 to 15 in the binary, hexadecimal and decimal bases.
Binary |
Hexadecimal |
Decimal |
0000 |
0 |
0 |
0001 |
1 |
1 |
0010 |
2 |
2 |
0011 |
3 |
3 |
0100 |
4 |
4 |
0101 |
5 |
5 |
0110 |
6 |
6 |
0111 |
7 |
7 |
1000 |
8 |
8 |
1001 |
9 |
9 |
1010 |
A |
10 |
1011 |
B |
11 |
1100 |
C |
12 |
1101 |
D |
13 |
1110 |
E |
14 |
1111 |
F |
15 |
Table A2.1: Representing numbers in binary, hexadecimal and decimal bases
Exercises
A2.3. Convert the following:
(a) From hexadecimal to 16-bit binary numbers : 1B87h, AF33h, 713Ch, 80EFh.
(b) From binary to hexadecimal: 1111 1100 1000 0111 and 1010 1110 1001 1100.
Negative Numbers
So far the only numbers we have looked at were unsigned numbers. It is important to be able to represent both unsigned and signed numbers, i.e. both positive and negative numbers. This raises the question of how to represent negative numbers. In our everyday representation of negative numbers, we use a minus sign ‘-’ (hyphen on a keyboard) to indicate that a number is negative. Inside a computer we must represent the sign in binary, since all information is stored in binary form. There are a number of methods for representing negative numbers, two of which are the signed magnitude and two’s complement representations.
Signed Magnitude Numbers
In a signed magnitude number, the most significant bit is used as a sign bit to indicate whether the number is positive or negative. Thus for an 8-bit number, the most significant bit, bit number 7, acts as the sign bit. The value 1 is used to indicate a negative number and the value 0 indicates a positive number. The remaining bits are used to represent the magnitude of the number.
Example A2.1: The numbers 45 and -45 are represented in signed magnitude as:
+45 = 00101101B
-45 = 10101101B
If we are not interested in negative numbers we can represent numbers ranging from 0 to 255 using 8 bits. Such numbers are called unsigned numbers. Using signed magnitude the range that can be represented using 8 bits is from -127 to +127 and we should note that 0 is represented twice (as 1000 0000B and 0000 0000B, i.e. a positive and negative zero). This dual representation of zero causes problems when we wish to compare a variable with 0. In addition, the hardware to do arithmetic using signed magnitude numbers, is complex and slow when compared with using the two’s complement representation of signed numbers. As a result, the signed magnitude representation is rarely used in computers.
Complementary Number Representation
We are used to the concept of a sign and a magnitude when dealing with decimal numbers. A positive sign is implicit (if not written) and negative numbers are written by preceding the positive number by a minus sign. The idea of a complementary number system is that each number has a unique representation. Two’s complement is a complementary number system used in computers and is the most commonly used method for representing signed numbers.
Two's Complement Numbers
We still use a sign bit as an indicator of the sign of the number. In the case of positive numbers, the representation is identical to that of signed magnitude, i.e. the sign bit is 0 and the remaining bits represent the positive number. In the case of negative numbers, the sign bit is 1 but the bits to the right of the sign bit do not directly indicate the magnitude of the number. The number -1 for example is represented by 1111 if we use 4-bit two’s complement numbers while it would represent the number -7 using a signed magnitude representation. Figure A2.1 shows the range of numbers that can be represented using a 4-bit two’s complement representation.
Figure A2.1: 4-bit two’s complement numbers with their decimal equivalents
In terms of digit weightings we can interpret two’s complement numbers as using a negative weight for the most significant bit. In the case of a 4-bit two’s complement number, the sign bit can be interpreted as having a weight of -23, i.e. -8 and the remaining bits have their usual positive weightings. In the case of an 8-bit two’s complement number, the sign bit has a weight of -27, i.e. -128. Thus the weightings of an 8-bit two’s complement number may be written as:
Bit position: 7 6 5 4 3 2 1 0
Weighting: -27 26 25 24 23 22 21 20
Signed equivalent: -128 +64 +32 +16 +8 +4 +2 +1
The two complement number 1000 0000 (i.e. -128) is the largest negative number that can be represented using 8 bits. The two’s complement number 0111 1111 (127) is the largest positive number that can be represented using 8 bits. In other words a total of 256 numbers can be represented using 8-bit two’s complement numbers, ranging from -128 to 127. There is only one representation for zero. Table A2.2 lists the decimal equivalents of some 8-bit, 2’s complement and unsigned binary numbers.
Table A2.2: Some 8-bit unsigned and two’s complement numbers
To convert a two’s complement number such as 1000 0001 to decimal, we can use the weightings given above to compute the decimal equivalent, in this case, -128 + 1, i.e. -127. However, there is an easier method for carrying out two’s complement conversions. To convert a binary number such as 0110 0011 (9910) to its negative two’s complement form, we carry out two operations. First we complement the number by changing all the one bits to zeros and the zero bits to ones (this is called the one's complement representation of the number). This step is also described as flipping the bits. The one's complement of 0110 0011, is 1001 1100. The second step in converting to two's complement , is to add 1 to the one's complement representation. The one's complement was 1001 1100, so the two's complement of this number is 1001 1100 + 0000 0001 giving 1001 1101 (-9910). In summary, to convert any unsigned binary number to its two’s complement form, we flip the bits of the number and add 1.
In addition, to convert a two’s complement number to its positive form we carry out the same steps, i.e. we flip the bits and add one. For example, converting 1001 1101 from two’s complement yields 0110 0010 + 1 which gives 0110 0011 (9910) and because the sign bit was 1 we know that the original number represented -9910.
Example A2.2: Convert -8910 to two’s complement:
8910 = 01011001
One's complement 8910 = 10100110
Add one + 1
Two's complement -8910 = 10100111
Two’s Complement Conversion Rule:
To negate a number to two’s complement: Flip the bits and add 1.
A nice property of two's complement numbers is the we can add them together without concern for the sign. For example adding +89 to -89 is carried out as:
010110012 8910
+ 101001112 -8910
000000002 010
This means that to subtract one number from another number, we just negate the number to be subtracted and then add the two numbers. This operation is performed, whether the number to be subtracted is positive or negative. This means that using two's complement numbers, the processor can add and subtract, using the same hardware circuit, i.e. a separate circuit for subtraction is not required. This simplifies the design of the ALU and is one of the reasons why two’s complement is the most commonly used method for representing negative numbers in microprocessors.
Number Range and Overflow
The range of numbers (called the number range) that can be stored in a given number of bits is important. Given an 8-bit number, we can represent unsigned numbers in the range 0 to 255 (28-1) and two’s complement numbers in the range -128 to +127 (-27 to 27). Given a 16-bit number, we can represent unsigned numbers in the range 0 to 65,535 (216 -1) and two’s complement numbers in the range -32768 to 32767 (-215 to 215-1). In general given an n-bit number, we can represent unsigned numbers in the range 0 to 2n -1 and two’s complement numbers in the range -2n-1 to 2n-1 -1.
When we wish to know the maximum amount of memory that a processor can access, we look at the number of bits that the processor uses to represent a memory address. This determines the maximum memory address that can be accessed. For example, a processor that uses 16-bit addresses will be able to access up to 65,536 memory locations (64Kb), with addresses from 0 to 65,535. A 20-bit address allows up to 220 (1Mb) memory locations to be accessed, a 24-bit address allows up to 16Mb (224 bytes) of RAM to be accessed, a 32-bit address allows up to 4Gb (232 bytes) of RAM to be accessed.
What happens if we attempt to store a larger unsigned value than 255 (or a more negative signed value than -128) in an 8-bit register? For example, if we attempt the calculation 70 + 75 using 8-bit two’s complement numbers, the result of 145 (1001 0001B) is a negative number in 2's complement! This situation, when it arises is called overflow. It occurs when we attempt to represent a number outside the range of numbers that can be stored in a particular register or memory variable. Overflow is detected by the hardware of the CPU and its occurrence is recorded in the CPU’s status register. This register uses one flag called the overflow flag or O-flag to record the occurrence of an overflow. This allows the programmer to test for this condition in an assembly language program and deal with it appropriately. The 8086 provides the jo/jno instructions to branch (or not to branch) if overflow occurs. Figure A2.2 illustrates the relationship between number range and overflow for 8-bit two’s complement numbers.
Figure A2.2: Number range and overflow regions for 8-bit two’s complement numbers
Note: In our programs we may use numbers two’s complement or unsigned binary numbers. However, this raises a very important question, how can we tell by looking at a number whether it is a two’s complement number or an unsigned number. Does 1111 1111B represent the decimal number 255 or the number -1? The answer is that we cannot tell by looking at a number, how it should be interpreted. It is the responsibility of the programmer to use the number correctly. It is important to remember that you can never tell how any byte (word or long word) is to be interpreted by looking at its value alone. It could represent a signed or unsigned number, an ASCII code, a machine code instruction and so on. The context (in which the information stored in the byte is used) will determine how it is to be interpreted. Assembly languages provide separate conditional jump instructions for handling comparisons involving unsigned or signed numbers. It is the programmers responsibility to use the correct instructions.
Exercises
A2.4 Convert the decimal numbers -64,-127,-15,-16,-1,+32 and +8 to 8-bit signed magnitude and two’s complement numbers.
A2.5 What is the range of unsigned numbers that can be represented by 20-bit, 24-bit and 32-bit numbers?
A2.6 What is the range of numbers that can be represented using 32-bit two’s complement numbers?
A2.7 What problem arises in representing zero in signed magnitude and one's complement?
A2.8 What is overflow and how might it occur?
A2.2.2 Floating-point Numbers
So far we have described how to represent integers in a computer system, which is sufficient for the purposes of this text. We now briefly introduce methods for representing real numbers. A different representation is required for real (usually called floating-point) numbers. We sometimes write such numbers in scientific notation so that the number 562.42 can be written as 0.56242 x 103. We can express any floating-point number as: ± m x rexp where m is the mantissa, r is the radix and exp is the exponent. For decimal numbers the radix is 10 and for binary numbers the radix is 2. Since we use binary numbers in a computer system, we do not have to store the radix explicitly when representing floating-point numbers. This means that we only need to store the mantissa and the exponent of the number to be represented. Thus, for the number 0.11011011 x 23 only the values 11011011 and the exponent 3 (converted to binary) need to be stored. The binary point and radix are implicit.
A floating-point number is normalised if the most significant digit of the mantissa is non-zero as in the above example. Any floating-point number can be normalised by adjusting the exponent appropriately. For example, 0.0001011 is normalised to 0.1011 x 2-3. To represent 0, as a floating-point number, both the mantissa and the exponent are represented as zero.
There are various standards (IEEE, ANSI etc.) that define how the mantissa and exponent of a floating-point number should be stored. Most standards use a 32-bit format for storing single precision floating-point numbers and a 64-bit format for storing double precision floating-point numbers. A possible format for a 32-bit floating-point number, is the following, which uses a sign bit, 23 bits to represent the mantissa and the remaining 8 bits to represent the exponent. The mantissa could be represented using either signed magnitude or 2’s complement. The exponent could be in 2’s complement (but another form called excess notation is also used). Thus a 32-bit floating-point number could be represented as follows, where S is the sign bit:
Using this format, the number 0.1101 1011 x 23 could be stored as:
A2.2.3 Binary Coded Decimal (BCD) Numbers
This form of number representation, allows us to represent numbers, in their decimal form in that each digit of the decimal number is translated to binary and the binary representations of the digits making up the decimal number are stored. This makes I/O operations for BCD numbers easier than if we represent decimal numbers as pure binary numbers as described earlier. For example, when we read a number like 254 from the keyboard, we convert it to binary if we wish to do arithmetic with it. This involves converting the ‘2’ to its binary form, multiplying it by 100, converting the ‘5’ to its binary form, multiplying it by 10 and adding it to the previous number (giving 250) and finally reading the ‘4’, converting it to its binary form and adding it to the previous sum giving the number 254. The BCD method of representing numbers is used to get around this problem. Each digit is simply encoded into its equivalent 4-bit form and each digit is stored separately, instead of storing the binary equivalent of the whole number. Thus 254D would be stored as
0000 0010 0000 0101 0000 0100
2 5 4
in BCD form, using 8-bits per digit (unpacked) as opposed to representing it as
1111 1110
in unsigned binary form. The above BCD number can be packed so as to store two digits per byte:
0000 0010 0101 0100
0 2 5 4
The 8086 provides special instructions that allow addition, subtraction, multiplication and division to be carried out on BCD numbers.
A2.3 Representing Characters: ASCII Codes
ASCII codes are uses to represent characters in a computer system. Each character we wish to use must be assigned a unique binary code or number to distinguish it from all other characters. There are over 100 characters to be represented, when we count the uppercase letters (A to Z), the lowercase letters (a to z), the digits (0 to 9), the punctuation(,;."?!) and other characters. Using ASCII codes for example, the letter A is represented by the binary code 1000001, the letter B by 1000010 and so on. Because it is awkward to write these codes in binary form, we usually convert them to their decimal (or hexadecimal, i.e. base 16) equivalents. So the letter A may be represented by code 65 in decimal and B by code 66. Inside the computer they are always represented as binary numbers.
The standard ASCII code uses 7 bits, i.e. each character is represented by 7 bits. As we have seen, the letter A is represented by the 7 bits 1000001. Because standard ASCII codes use 7 bits, a total of 128 different characters may be represented (27 = 128, the number of combinations of 7 bits). There are codes for the uppercase letters, lowercase letters, digits, punctuation and other symbols (e.g. ,;:"?’! *&%$#+-<>/[]{}\~() etc.). In addition there are ASCII codes for a number of special characters called control characters. Control characters are used to control devices attached to the computer and to control communications between the computer and these devices. For example, one such character causes your computer to beep: the Bel character (ASCII code 7). The Line-feed (ASCII code 10) character causes a print head or screen cursor to go onto a new line. The Carriage Return (ASCII code 13) character causes the print head or screen cursor to go to the start of a line. The Form-feed (ASCII code 12) character causes a printer to skip to the top of the next page. Other control characters are used to control communication between devices and the computer. Table A2.3 is a list of some commonly used ASCII codes and the full list of ASCII characters is given in Table A2.4. Remember, inside the computer it is the binary form of the ASCII code that is used. The decimal and hexadecimal values are useful for people to refer to a particular ASCII code.
Char |
Binary |
Hex |
Decimal |
Char |
Binary |
Hex |
Decimal |
NUL |
000 0000 |
00 |
0 |
A |
100 0001 |
41 |
65 |
BEL |
000 0111 |
07 |
7 |
B |
100 0010 |
42 |
66 |
LF |
000 1010 |
0A |
10 |
C |
100 0011 |
43 |
67 |
FF |
000 1100 |
0C |
12 |
D |
100 0100 |
44 |
68 |
CR |
000 1011 |
0D |
13 |
E |
100 0101 |
45 |
69 |
SP |
010 0000 |
20 |
32 |
F |
100 0110 |
46 |
70 |
G |
100 0111 |
47 |
71 |
||||
H |
100 1000 |
48 |
72 |
||||
* |
010 1010 |
2A |
42 |
Y |
101 1001 |
59 |
89 |
+ |
010 1011 |
2B |
43 |
Z |
101 1010 |
5A |
90 |
, |
010 1100 |
2C |
44 |
[ |
101 1011 |
5B |
91 |
- |
010 1101 |
2D |
45 |
\ |
101 1100 |
5C |
92 |
. |
010 1110 |
2E |
46 |
||||
/ |
010 1111 |
2F |
47 |
a |
110 0001 |
61 |
97 |
0 |
011 0000 |
30 |
48 |
b |
110 0010 |
62 |
98 |
1 |
011 0001 |
31 |
49 |
c |
110 0011 |
63 |
99 |
2 |
011 0010 |
32 |
50 |
d |
110 0100 |
64 |
100 |
3 |
011 0011 |
33 |
51 |
e |
110 0101 |
65 |
101 |
4 |
011 0100 |
34 |
52 |
f |
110 0110 |
66 |
102 |
5 |
011 0101 |
35 |
53 |
g |
110 0111 |
67 |
103 |
6 |
011 0110 |
36 |
54 |
h |
110 1000 |
68 |
104 |
7 |
011 0111 |
37 |
55 |
||||
8 |
011 1000 |
38 |
56 |
y |
111 1001 |
79 |
121 |
9 |
011 1001 |
39 |
57 |
z |
111 1010 |
7A |
122 |
Table A2.3: Some commonly used ASCII codes
A non-standard 8-bit version of the ASCII codes is also used, which means that 256 characters may be represented. This allows Greek letters, card suits, line drawing and graphic characters to be represented. The 8-bit version is usually referred to as extended ASCII. When standard 7-bit ASCII codes are used, the 8th bit is available for other uses. One such use is as a parity bit. A parity bit is used for error detection. When data is transmitted over long distances, perhaps using telephone lines, there is a possibility that the data will be corrupted due to electrical noise. This means that a bit may be flipped, i.e. a 1 bit gets changed to a 0 bit or vice versa. A parity bit can give some protection to allow you detect that such corruption has occurred, i.e. a bit has been changed from 1 to 0 or 0 to 1. There are two parity schemes called even parity and odd parity. In an even parity scheme, the parity bit is used to ensure that the code for each character contains an even number of 1’s. Thus, the parity bit would be set to 0 in the case of the letter ‘A’ whose ASCII code 100 0001 and the character would be transmitted as 0100 0001. The parity bit would be set to 1 in the case of the letter ‘B’ whose ASCII code is 100 0011 in order to make the number of 1’s even and the character would be transmitted as 1100 0011. In an odd parity scheme, the parity bit is used in the same fashion, except that it is set to 0 or 1 in order to make the number of 1’s transmitted, odd. Thus ‘A’ would be transmitted as 1100 0001 and ‘B’ as 0100 0011, if using an odd parity scheme.
When using parity bits, the receiver of a character can detect a single bit error, by computing the parity bit for the other 7 data bits and comparing it with the actual parity bit transmitted. If they are not the same, then an error has occurred. Parity checking cannot detect the corruption of a number of bits in the same byte, but this is relatively rare. Other methods may be used to detect such errors and they are studied in the field of data communications. The 8086 has conditional jump instructions for testing parity (jpo to jump on odd parity and jpe to jump on even parity).
ASCII is not the only such standard for representing information. IBM mainframe computers use EBCDIC (Extended Binary Coded Decimal Interchange Code) codes which are 8-bit codes, different from those used in the ASCII standard. In addition, work is in progress to provide a 16-bit standard code (referred to as Unicode) for representing characters. The problem with ASCII codes is that a maximum of 256 characters can be represented. While this is fine for handling text in the English language, it is useless for handling other languages such as Chinese or Japanese where there are literally thousands of individual characters making up the language alphabet. A 16-bit code allows in excess of 65,000 characters to be represented and so is sufficient for the alphabet of almost any language.
Exercises
A2.9 Look up the ASCII codes for the digits 0 - 9. What do you notice about the rightmost (low-order) 4 bits and the leftmost (high-order) 4 bits of each code?
A2.10 What is the numeric difference between the ASCII codes for any uppercase letter (e.g. ‘A’) and the corresponding lowercase letter (e.g. ‘a’)?
A2.4 Summary
In this appendix we have described how information is represented inside a computer system. We described how signed and unsigned numbers can be represented and we also discussed the use of ASCII codes for representing characters. Table A2.4 is a full listing of the ASCII codes.
A2.5 Reading List
As for Chapter 2 and 5.
Megarry, J. (1985) Inside Information: Computers, Communications and People, BBC, London.
Standard ASCII Codes
Char |
Binary |
Hex |
Decimal |
Char |
Binary |
Hex |
Decimal |
NUL |
000 0000 |
00 |
0 |
SP |
010 0000 |
20 |
32 |
SOH |
000 0001 |
01 |
1 |
! |
010 0001 |
21 |
33 |
STX |
000 0010 |
02 |
2 |
" |
010 0010 |
22 |
34 |
ETX |
000 0011 |
03 |
3 |
# |
010 0011 |
23 |
35 |
EOT |
000 0100 |
04 |
4 |
$ |
010 0100 |
24 |
36 |
ENQ |
000 0101 |
05 |
5 |
% |
010 0101 |
25 |
37 |
ACK |
000 0110 |
06 |
6 |
& |
010 0110 |
26 |
38 |
BEL |
000 0111 |
07 |
7 |
‘ |
010 0111 |
27 |
39 |
BS |
000 1000 |
08 |
8 |
( |
010 1000 |
28 |
40 |
HT |
000 1001 |
09 |
9 |
) |
010 1001 |
29 |
41 |
LF |
000 1010 |
0A |
10 |
* |
010 1010 |
2A |
42 |
VT |
000 1011 |
0B |
11 |
+ |
010 1011 |
2B |
43 |
FF |
000 1100 |
0C |
12 |
, |
010 1100 |
2C |
44 |
CR |
000 1011 |
0D |
13 |
- |
010 1101 |
2D |
45 |
SO |
000 1110 |
0E |
14 |
. |
010 1110 |
2E |
46 |
SI |
000 1111 |
0F |
15 |
/ |
010 1111 |
2F |
47 |
DLE |
001 0000 |
10 |
16 |
0 |
011 0000 |
30 |
48 |
DC1 |
001 0001 |
11 |
17 |
1 |
011 0001 |
31 |
49 |
DC2 |
001 0010 |
12 |
18 |
2 |
011 0010 |
32 |
50 |
DC3 |
001 0011 |
13 |
19 |
3 |
011 0011 |
33 |
51 |
DC4 |
001 0100 |
14 |
20 |
4 |
011 0100 |
34 |
52 |
NAK |
001 0101 |
15 |
21 |
5 |
011 0101 |
35 |
53 |
SYN |
001 0110 |
16 |
22 |
6 |
011 0110 |
36 |
54 |
ETB |
001 0111 |
17 |
23 |
7 |
011 0111 |
37 |
55 |
CAN |
001 1000 |
18 |
24 |
8 |
011 1000 |
38 |
56 |
EM |
001 1001 |
19 |
25 |
9 |
011 1001 |
39 |
57 |
SUB |
001 1010 |
1A |
26 |
: |
011 1010 |
3A |
58 |
ESC |
001 1011 |
1B |
27 |
; |
011 1011 |
3B |
59 |
FS |
001 1100 |
1C |
28 |
< |
011 1100 |
3C |
60 |
GS |
001 1101 |
1D |
29 |
= |
011 1101 |
3D |
61 |
RS |
001 1110 |
1E |
30 |
> |
011 1110 |
3E |
62 |
US |
001 1111 |
1F |
31 |
? |
011 1111 |
3F |
63 |
Char |
Binary |
Hex |
Decimal |
Char |
Binary |
Hex |
Decimal |
@ |
100 0000 |
40 |
64 |
` |
110 0000 |
60 |
96 |
A |
100 0001 |
41 |
65 |
a |
110 0001 |
61 |
97 |
B |
100 0010 |
42 |
66 |
b |
110 0010 |
62 |
98 |
C |
100 0011 |
43 |
67 |
c |
110 0011 |
63 |
99 |
D |
100 0100 |
44 |
68 |
d |
110 0100 |
64 |
100 |
E |
100 0101 |
45 |
69 |
e |
110 0101 |
65 |
101 |
F |
100 0110 |
46 |
70 |
f |
110 0110 |
66 |
102 |
G |
100 0111 |
47 |
71 |
g |
110 0111 |
67 |
103 |
H |
100 1000 |
48 |
72 |
h |
110 1000 |
68 |
104 |
I |
100 1001 |
49 |
73 |
i |
110 1001 |
69 |
105 |
J |
100 1010 |
4A |
74 |
j |
110 1010 |
6A |
106 |
K |
100 1011 |
4B |
75 |
k |
110 1011 |
6B |
107 |
L |
100 1100 |
4C |
76 |
l |
110 1100 |
6C |
108 |
M |
100 1101 |
4D |
77 |
m |
110 1101 |
6D |
109 |
N |
100 1110 |
4E |
78 |
n |
110 1110 |
6E |
110 |
O |
100 1111 |
4F |
79 |
o |
110 1111 |
6F |
111 |
P |
101 0000 |
50 |
80 |
p |
111 0000 |
70 |
112 |
Q |
101 0001 |
51 |
81 |
q |
111 0001 |
71 |
113 |
R |
101 0010 |
52 |
82 |
r |
111 0010 |
72 |
114 |
S |
101 0011 |
53 |
83 |
s |
111 0011 |
73 |
115 |
T |
101 0100 |
54 |
84 |
t |
111 0100 |
74 |
116 |
U |
101 0101 |
55 |
85 |
u |
111 0101 |
75 |
117 |
V |
101 0110 |
56 |
86 |
v |
111 0110 |
76 |
118 |
W |
101 0111 |
57 |
87 |
w |
111 0111 |
77 |
119 |
X |
101 1000 |
58 |
88 |
x |
111 1000 |
78 |
120 |
Y |
101 1001 |
59 |
89 |
y |
111 1001 |
79 |
121 |
Z |
101 1010 |
5A |
90 |
z |
111 1010 |
7A |
122 |
[ |
101 1011 |
5B |
91 |
{ |
111 1011 |
7B |
123 |
\ |
101 1100 |
5C |
92 |
| |
111 1100 |
7C |
124 |
] |
101 1101 |
5D |
93 |
} |
111 1101 |
7D |
125 |
^ |
101 1110 |
5E |
94 |
~ |
111 1110 |
7E |
126 |
__ |
101 1111 |
5F |
95 |
DEL |
111 1111 |
7F |
127 |
Table A2.4: Standard ASCII Codes: