Information Representation

In order to understand what happens inside a computer we must understand how a computer represents information internally. For example, the text you are reading is stored on a computer disk. It was entered into a computer system using a word processing program. Inside the computer’s memory and on the disk this text (and all other information) can be thought of as being represented by a very long sequence of ones and zeros, i.e. in binary form. It appears as text only on a computer’s monitor or on printed output. A computer monitor’s hardware receives binary information and transforms it to the symbols that are displayed. Similarly, a printer’s hardware converts binary information to the text (or graphic) that is printed. Inside the CPU and the computer’s memory, information is actually stored and transmitted in electrical form, while on disk it is stored in magnetic form. A particular electrical signal represents a one and another signal represents a zero. Similarly, a specific magnetic orientation represents a one and another orientation represents a zero. A group of these binary digits (bits) can be considered to form a binary code and information is encoded as a sequence of individual units each of which correspond to a distinct binary code.

Fundamental Principle:

All information is represented by binary codes in a computer system.

A2.1 Classifying Information

From a computing perspective, we can broadly classify the information we wish to store as either numeric or textual. Numeric information refers to information with which we may wish to do arithmetic, i.e. numbers such as 23, 404, 3.14159, 0.96 and so on. Numbers without a decimal point are called integers and those that have a decimal point are called reals. Computer scientists often refer to real numbers as floating-point numbers. It is important to note that different techniques are used to represent integers and reals inside a computer system. Textual information is made up of individual characters (e.g. a,b,c,.., A,B,C.., -, +, ; , ", &, %, #, ’ , 0,1,2,..9 etc.). Examples include a person’s name, address, phone number, an essay, the information on the page you are reading. The term alphanumeric refers to text containing both alphabetic characters and numbers such as an address, e.g. London W12. Fortunately international standards for information representation have been developed to make life easier for computer manufacturers and users. This means that we can easily transfer information from one make of computer to another. One such standard is the ASCII (American Standard Code for Information Interchange) standard. This standard is used for representing textual information and is described later.

A2.2 Representing Numbers

A2.2.1 Integers

We are so familiar with the decimal number system that we might think that everything that uses numbers would, by default, use the decimal system. However, computers, as we have said, use a number system called binary, which involves only two digits: 0 and 1. In the decimal system, we represent numbers using 10 digits based on powers of 10. For example, the number 2376 may also be written as:

2*103 + 3*102 + 7*101 + 6*100

More formally, we say that the digits in a positive decimal number are weighted by increasing powers of 10. We say that they use the base 10. To illustrate this further, we could write the above number in the following form:

weighting: 103 102 101 100

digits 2 3 7 6

decimal 2376 = 2*103 + 3*102 + 7*101 + 6*100

The leftmost digit, 2 in this example, is called the most significant digit. The rightmost digit, 6 in this example, is called the least significant digit. The digits on the left hand side are called the high-order digits (higher powers of 10) and the digits on the right hand side are called the low-order digits (lower powers of 10). The above system can be used to write numbers in any base m as follows, assuming we are dealing with a 4 digit number:

weighting: m3 m2 m1 m0

digits d3 d2 d1 d0

basem d3d2d1d0= d3*m3 + d2*m2 + d1*m1 + d0*m0

We refer to the digits making up a number by their position, starting from the right hand side with position 0 (i.e. we count from zero). As we have seen earlier, when the base is 10, we replace m by 10. In the binary number system the base is 2 so we can write the 8-bit binary number 0101 1100 as:

weighting: 27 26 25 24 23 22 21 20

bits 0 1 0 1 1 1 0 0

01011100 = 0*27+1*26+0*25+1*24+1*23+1*22+0*21+0*20

= 0 + 6410 + 0 + 1610 + 810 + 410+ 0 + 0

= 9210

The leftmost bit is called the most significant bit (MSB). The leftmost bits in a binary number are referred to as the high-order bits. The rightmost bit in a binary number is called the least significant bit (LSB). The rightmost bits in a binary number are referred to as the low-order bits. In the case of a 16-bit number, we use the above scheme with the powers of 2 ranging from 0 to 15, with a 32-bit number, the powers of 2 range from 0 to 31.

Exercises

A2.1. Convert the following binary numbers to decimal:

(i) 0000 1000 (ii) 0000 1001 (iii) 0000 0111

(iv) 0100 0001 (v) 0111 1111 (vi) 0110 0001

Converting from Decimal to Binary

To convert from any number base to another, you repeatedly divide the number to be converted by the new base and the remainder of the division at each stage becomes a digit in the new base until the result of the division is 0. So to convert decimal 35 to binary we do the following:

Remainder

35 / 2 1

17 / 2 1

8 / 2 0

4 / 2 0

2 / 2 0

1 / 2 1

0

The result is read upwards giving 3510 = 1000112. The number 1000112 is an unsigned binary number. We can convert any positive decimal number to binary using the above method. Signed numbers are represented differently, as described later.

Shortcuts

To convert any decimal number which is a power of 2, to binary, you simply write 1 followed by the number of zeros given by the power of 2. For example, 32 is 25, so we write it as 1 followed by 5 zeros, i.e. 10000; 128 is 27 so we write it as 1 followed by 7 zeros, i.e. 100 0000. Another thing worth remembering is that the largest binary number that can be stored in a given number of bits is made up of all 1’s. An easy way to convert this to decimal, is to note that this value is one less than 2 to the power of the number of bits. For example, if we are using 4-bit numbers, the largest value we can represent is 1111 which is 24-1, i.e. 15; with a 6-bit number the largest value we can represent is 11 1111, the decimal equivalent is 26-1, i.e. 63. The same techniques apply to decimal numbers, e.g. the largest value that a 3 digit decimal number can represent is 999 is 103-1.

Exercises

A2.2. Convert the following decimal numbers to binary, writing them as 8-bit binary numbers. You may pad with 0’s on the left hand side to make up 8 bits when necessary:

(i) 3 (ii) 15 (iii) 16

(iv) 63 (v) 64 (vi) 255

Hexadecimal Numbers

One difficulty with binary numbers is that they tend to be composed of long sequences of bits and so it is easy to err in working with them. Thus, in writing them down we might arrange them in groups of 3 or 4. A 16-bit binary number such as 0110100011001010 could be written as 0110 1000 1100 1010 which is easier to read. However, it is still tedious to work with such numbers. For this reason we use other number systems. Because 10 is not an exact power of 2, it is not as convenient for working with binary numbers as a system using a base which is a power of 2. Two commonly used computer number systems are the hexadecimal (base 16) and octal (base 8) systems. It is easier to convert a binary number to hexadecimal or octal (and vice versa) than it is to convert a binary number to decimal.

With the hexadecimal system, we require 16 distinct digits (representing the numbers 0 to 15). Using the number digits alone only gives us ten digits, so we use letters as well. The decimal numbers 0 to 15 are represented by the hexadecimal digits 0 to 9 and the letters A to F. The hexadecimal digits correspond to their decimal equivalents. The letter A represents 10, B represents 11, C represents 12, D represents 13, E represents 14 and F represents 15. Hexadecimal numbers are weighted in powers of 16, so the number 2FA can be converted to decimal as follows:

weighting: 162 161 160

digits 2 F A

2FA = 2 * 162+ F * 161+ A * 160

= 2 * 162+ 15 * 161+ 10 * 160

= 256 + 240 + 10

= 50610

A hexadecimal digit is represented by 4 bits (since 24 = 16 and we require 16 bit patterns for 16 digits). Thus, all 32-bit numbers can be written as 8-digit hexadecimal numbers, 16-bit binary numbers can be written as 4-digit hexadecimal numbers and 8-bit binary numbers can be represented by 2-digit hexadecimal numbers. This is the major advantage of using hexadecimal numbers, since computers tend to almost always use 32-bit, 16-bit or 8-bit numbers to represent information. So, if we wish to write down the contents of a processor register which can store 16-bits, we can write it as a 4-digit hexadecimal number. If we were to use decimal, we would have to write it as a number between 0 and 65,536 and this is not nearly so convenient or useful.

To convert a binary number to hexadecimal, you break the number into groups of 4 bits from the right hand side. Then you convert each group of 4 bits into its equivalent hexadecimal digit. This process gives you the hexadecimal equivalent of the original binary number. If there are fewer than 4 bits in the leftmost group, you still convert them to their hexadecimal equivalent (pad on the left with 0’s). For example, the 16-bit binary number 0110100011001010 can be divided into the groups: 0110 1000 1100 1010. These are converted to the hexadecimal digits 68CA. Assembly language programmers make extensive use of the hexadecimal numbering system, so it is important to be familiar with it. For example, the letter ‘A’ is represented inside a computer by the binary code 0100 0001 which can be written in hexadecimal as 41. To convert from hexadecimal to binary, we simply convert each hexadecimal digit to its 4-bit equivalent, padding with 0’s on the left, if necessary to make up the 4 bits. For example, hexadecimal digit 4 is written as 0100 in binary, hexadecimal digit 1 is written as 0001 in binary, thus, hexadecimal number 41 is 0100 0001 in binary.

When we write a number, we must indicate its base so that we know which number we are referring to. For example, the number 10 in binary represents two; in decimal it represents ten and in hexadecimal it represents sixteen. There are two common methods for indicating the base, when writing a number in an assembly language programs, one method (used in 8086 programs) is to add a character to the end of the number to indicate the base, e.g. H for hexadecimal, D for decimal and B for binary. Thus, we can easily distinguish between the numbers 10H, 10D and 10B. Lowercase letters may also be used. The second method (used in M68000 programs) is to begin the number with a character which indicates the base. For example, hexadecimal numbers may be written so as to begin with the ‘$’ character and binary numbers written so as to begin with the ‘%’ character. Using this notation, the number $10 represents the hexadecimal number 10 and %10 represents the binary number 10. The absence of a leading character indicates a decimal number in an M68000 program. Table A2.1 displays the numbers 1 to 15 in the binary, hexadecimal and decimal bases.

Binary

Hexadecimal

Decimal

0000

0

0

0001

1

1

0010

2

2

0011

3

3

0100

4

4

0101

5

5

0110

6

6

0111

7

7

1000

8

8

1001

9

9

1010

A

10

1011

B

11

1100

C

12

1101

D

13

1110

E

14

1111

F

15

Table A2.1: Representing numbers in binary, hexadecimal and decimal bases

Exercises

A2.3. Convert the following:

(a) From hexadecimal to 16-bit binary numbers : 1B87h, AF33h, 713Ch, 80EFh.

(b) From binary to hexadecimal: 1111 1100 1000 0111 and 1010 1110 1001 1100.

Negative Numbers

So far the only numbers we have looked at were unsigned numbers. It is important to be able to represent both unsigned and signed numbers, i.e. both positive and negative numbers. This raises the question of how to represent negative numbers. In our everyday representation of negative numbers, we use a minus sign -’ (hyphen on a keyboard) to indicate that a number is negative. Inside a computer we must represent the sign in binary, since all information is stored in binary form. There are a number of methods for representing negative numbers, two of which are the signed magnitude and two’s complement representations.

Signed Magnitude Numbers

In a signed magnitude number, the most significant bit is used as a sign bit to indicate whether the number is positive or negative. Thus for an 8-bit number, the most significant bit, bit number 7, acts as the sign bit. The value 1 is used to indicate a negative number and the value 0 indicates a positive number. The remaining bits are used to represent the magnitude of the number.

Example A2.1: The numbers 45 and -45 are represented in signed magnitude as:

+45 = 00101101B

-45 = 10101101B

 

If we are not interested in negative numbers we can represent numbers ranging from 0 to 255 using 8 bits. Such numbers are called unsigned numbers. Using signed magnitude the range that can be represented using 8 bits is from -127 to +127 and we should note that 0 is represented twice (as 1000 0000B and 0000 0000B, i.e. a positive and negative zero). This dual representation of zero causes problems when we wish to compare a variable with 0. In addition, the hardware to do arithmetic using signed magnitude numbers, is complex and slow when compared with using the two’s complement representation of signed numbers. As a result, the signed magnitude representation is rarely used in computers.

Complementary Number Representation

We are used to the concept of a sign and a magnitude when dealing with decimal numbers. A positive sign is implicit (if not written) and negative numbers are written by preceding the positive number by a minus sign. The idea of a complementary number system is that each number has a unique representation. Two’s complement is a complementary number system used in computers and is the most commonly used method for representing signed numbers.

Two's Complement Numbers

We still use a sign bit as an indicator of the sign of the number. In the case of positive numbers, the representation is identical to that of signed magnitude, i.e. the sign bit is 0 and the remaining bits represent the positive number. In the case of negative numbers, the sign bit is 1 but the bits to the right of the sign bit do not directly indicate the magnitude of the number. The number -1 for example is represented by 1111 if we use 4-bit two’s complement numbers while it would represent the number -7 using a signed magnitude representation. Figure A2.1 shows the range of numbers that can be represented using a 4-bit two’s complement representation.

Figure A2.1: 4-bit two’s complement numbers with their decimal equivalents

In terms of digit weightings we can interpret two’s complement numbers as using a negative weight for the most significant bit. In the case of a 4-bit two’s complement number, the sign bit can be interpreted as having a weight of -23, i.e. -8 and the remaining bits have their usual positive weightings. In the case of an 8-bit two’s complement number, the sign bit has a weight of -27, i.e. -128. Thus the weightings of an 8-bit two’s complement number may be written as:

Bit position: 7 6 5 4 3 2 1 0

Weighting: -27 26 25 24 23 22 21 20

Signed equivalent: -128 +64 +32 +16 +8 +4 +2 +1

The two complement number 1000 0000 (i.e. -128) is the largest negative number that can be represented using 8 bits. The two’s complement number 0111 1111 (127) is the largest positive number that can be represented using 8 bits. In other words a total of 256 numbers can be represented using 8-bit two’s complement numbers, ranging from -128 to 127. There is only one representation for zero. Table A2.2 lists the decimal equivalents of some 8-bit, 2’s complement and unsigned binary numbers.

Table A2.2: Some 8-bit unsigned and two’s complement numbers

To convert a two’s complement number such as 1000 0001 to decimal, we can use the weightings given above to compute the decimal equivalent, in this case, -128 + 1, i.e. -127. However, there is an easier method for carrying out two’s complement conversions. To convert a binary number such as 0110 0011 (9910) to its negative two’s complement form, we carry out two operations. First we complement the number by changing all the one bits to zeros and the zero bits to ones (this is called the one's complement representation of the number). This step is also described as flipping the bits. The one's complement of 0110 0011, is 1001 1100. The second step in converting to two's complement , is to add 1 to the one's complement representation. The one's complement was 1001 1100, so the two's complement of this number is 1001 1100 + 0000 0001 giving 1001 1101 (-9910). In summary, to convert any unsigned binary number to its two’s complement form, we flip the bits of the number and add 1.

In addition, to convert a two’s complement number to its positive form we carry out the same steps, i.e. we flip the bits and add one. For example, converting 1001 1101 from two’s complement yields 0110 0010 + 1 which gives 0110 0011 (9910) and because the sign bit was 1 we know that the original number represented -9910.

Example A2.2: Convert -8910 to two’s complement:

8910 = 01011001

One's complement 8910 = 10100110

Add one + 1

Two's complement -8910 = 10100111

 

Two’s Complement Conversion Rule:

To negate a number to two’s complement: Flip the bits and add 1.

A nice property of two's complement numbers is the we can add them together without concern for the sign. For example adding +89 to -89 is carried out as:

010110012 8910

+ 101001112 -8910

000000002 010

This means that to subtract one number from another number, we just negate the number to be subtracted and then add the two numbers. This operation is performed, whether the number to be subtracted is positive or negative. This means that using two's complement numbers, the processor can add and subtract, using the same hardware circuit, i.e. a separate circuit for subtraction is not required. This simplifies the design of the ALU and is one of the reasons why two’s complement is the most commonly used method for representing negative numbers in microprocessors.

Number Range and Overflow

The range of numbers (called the number range) that can be stored in a given number of bits is important. Given an 8-bit number, we can represent unsigned numbers in the range 0 to 255 (28-1) and two’s complement numbers in the range -128 to +127 (-27 to 27). Given a 16-bit number, we can represent unsigned numbers in the range 0 to 65,535 (216 -1) and two’s complement numbers in the range -32768 to 32767 (-215 to 215-1). In general given an n-bit number, we can represent unsigned numbers in the range 0 to 2n -1 and two’s complement numbers in the range -2n-1 to 2n-1 -1.

When we wish to know the maximum amount of memory that a processor can access, we look at the number of bits that the processor uses to represent a memory address. This determines the maximum memory address that can be accessed. For example, a processor that uses 16-bit addresses will be able to access up to 65,536 memory locations (64Kb), with addresses from 0 to 65,535. A 20-bit address allows up to 220 (1Mb) memory locations to be accessed, a 24-bit address allows up to 16Mb (224 bytes) of RAM to be accessed, a 32-bit address allows up to 4Gb (232 bytes) of RAM to be accessed.

What happens if we attempt to store a larger unsigned value than 255 (or a more negative signed value than -128) in an 8-bit register? For example, if we attempt the calculation 70 + 75 using 8-bit two’s complement numbers, the result of 145 (1001 0001B) is a negative number in 2's complement! This situation, when it arises is called overflow. It occurs when we attempt to represent a number outside the range of numbers that can be stored in a particular register or memory variable. Overflow is detected by the hardware of the CPU and its occurrence is recorded in the CPU’s status register. This register uses one flag called the overflow flag or O-flag to record the occurrence of an overflow. This allows the programmer to test for this condition in an assembly language program and deal with it appropriately. The 8086 provides the jo/jno instructions to branch (or not to branch) if overflow occurs. Figure A2.2 illustrates the relationship between number range and overflow for 8-bit two’s complement numbers.

Figure A2.2: Number range and overflow regions for 8-bit two’s complement numbers

 

Note: In our programs we may use numbers two’s complement or unsigned binary numbers. However, this raises a very important question, how can we tell by looking at a number whether it is a two’s complement number or an unsigned number. Does 1111 1111B represent the decimal number 255 or the number -1? The answer is that we cannot tell by looking at a number, how it should be interpreted. It is the responsibility of the programmer to use the number correctly. It is important to remember that you can never tell how any byte (word or long word) is to be interpreted by looking at its value alone. It could represent a signed or unsigned number, an ASCII code, a machine code instruction and so on. The context (in which the information stored in the byte is used) will determine how it is to be interpreted. Assembly languages provide separate conditional jump instructions for handling comparisons involving unsigned or signed numbers. It is the programmers responsibility to use the correct instructions.

Exercises

A2.4 Convert the decimal numbers -64,-127,-15,-16,-1,+32 and +8 to 8-bit signed magnitude and two’s complement numbers.

A2.5 What is the range of unsigned numbers that can be represented by 20-bit, 24-bit and 32-bit numbers?

A2.6 What is the range of numbers that can be represented using 32-bit two’s complement numbers?

A2.7 What problem arises in representing zero in signed magnitude and one's complement?

A2.8 What is overflow and how might it occur?

A2.2.2 Floating-point Numbers

So far we have described how to represent integers in a computer system, which is sufficient for the purposes of this text. We now briefly introduce methods for representing real numbers. A different representation is required for real (usually called floating-point) numbers. We sometimes write such numbers in scientific notation so that the number 562.42 can be written as 0.56242 x 103. We can express any floating-point number as: m x rexp where m is the mantissa, r is the radix and exp is the exponent. For decimal numbers the radix is 10 and for binary numbers the radix is 2. Since we use binary numbers in a computer system, we do not have to store the radix explicitly when representing floating-point numbers. This means that we only need to store the mantissa and the exponent of the number to be represented. Thus, for the number 0.11011011 x 23 only the values 11011011 and the exponent 3 (converted to binary) need to be stored. The binary point and radix are implicit.

A floating-point number is normalised if the most significant digit of the mantissa is non-zero as in the above example. Any floating-point number can be normalised by adjusting the exponent appropriately. For example, 0.0001011 is normalised to 0.1011 x 2-3. To represent 0, as a floating-point number, both the mantissa and the exponent are represented as zero.

There are various standards (IEEE, ANSI etc.) that define how the mantissa and exponent of a floating-point number should be stored. Most standards use a 32-bit format for storing single precision floating-point numbers and a 64-bit format for storing double precision floating-point numbers. A possible format for a 32-bit floating-point number, is the following, which uses a sign bit, 23 bits to represent the mantissa and the remaining 8 bits to represent the exponent. The mantissa could be represented using either signed magnitude or 2’s complement. The exponent could be in 2’s complement (but another form called excess notation is also used). Thus a 32-bit floating-point number could be represented as follows, where S is the sign bit:

Using this format, the number 0.1101 1011 x 23 could be stored as:

 

A2.2.3 Binary Coded Decimal (BCD) Numbers

This form of number representation, allows us to represent numbers, in their decimal form in that each digit of the decimal number is translated to binary and the binary representations of the digits making up the decimal number are stored. This makes I/O operations for BCD numbers easier than if we represent decimal numbers as pure binary numbers as described earlier. For example, when we read a number like 254 from the keyboard, we convert it to binary if we wish to do arithmetic with it. This involves converting the ‘2’ to its binary form, multiplying it by 100, converting the ‘5’ to its binary form, multiplying it by 10 and adding it to the previous number (giving 250) and finally reading the ‘4’, converting it to its binary form and adding it to the previous sum giving the number 254. The BCD method of representing numbers is used to get around this problem. Each digit is simply encoded into its equivalent 4-bit form and each digit is stored separately, instead of storing the binary equivalent of the whole number. Thus 254D would be stored as

0000 0010 0000 0101 0000 0100

2 5 4

in BCD form, using 8-bits per digit (unpacked) as opposed to representing it as

1111 1110

in unsigned binary form. The above BCD number can be packed so as to store two digits per byte:

0000 0010 0101 0100

0 2 5 4

The 8086 provides special instructions that allow addition, subtraction, multiplication and division to be carried out on BCD numbers.

 

A2.3 Representing Characters: ASCII Codes

ASCII codes are uses to represent characters in a computer system. Each character we wish to use must be assigned a unique binary code or number to distinguish it from all other characters. There are over 100 characters to be represented, when we count the uppercase letters (A to Z), the lowercase letters (a to z), the digits (0 to 9), the punctuation(,;."?!) and other characters. Using ASCII codes for example, the letter A is represented by the binary code 1000001, the letter B by 1000010 and so on. Because it is awkward to write these codes in binary form, we usually convert them to their decimal (or hexadecimal, i.e. base 16) equivalents. So the letter A may be represented by code 65 in decimal and B by code 66. Inside the computer they are always represented as binary numbers.

The standard ASCII code uses 7 bits, i.e. each character is represented by 7 bits. As we have seen, the letter A is represented by the 7 bits 1000001. Because standard ASCII codes use 7 bits, a total of 128 different characters may be represented (27 = 128, the number of combinations of 7 bits). There are codes for the uppercase letters, lowercase letters, digits, punctuation and other symbols (e.g. ,;:"?’! *&%$#+-<>/[]{}\~() etc.). In addition there are ASCII codes for a number of special characters called control characters. Control characters are used to control devices attached to the computer and to control communications between the computer and these devices. For example, one such character causes your computer to beep: the Bel character (ASCII code 7). The Line-feed (ASCII code 10) character causes a print head or screen cursor to go onto a new line. The Carriage Return (ASCII code 13) character causes the print head or screen cursor to go to the start of a line. The Form-feed (ASCII code 12) character causes a printer to skip to the top of the next page. Other control characters are used to control communication between devices and the computer. Table A2.3 is a list of some commonly used ASCII codes and the full list of ASCII characters is given in Table A2.4. Remember, inside the computer it is the binary form of the ASCII code that is used. The decimal and hexadecimal values are useful for people to refer to a particular ASCII code.

Char

Binary

Hex

Decimal

Char

Binary

Hex

Decimal

               

NUL

000 0000

00

0

A

100 0001

41

65

BEL

000 0111

07

7

B

100 0010

42

66

LF

000 1010

0A

10

C

100 0011

43

67

FF

000 1100

0C

12

D

100 0100

44

68

CR

000 1011

0D

13

E

100 0101

45

69

SP

010 0000

20

32

F

100 0110

46

70

       

G

100 0111

47

71

       

H

100 1000

48

72

               

*

010 1010

2A

42

Y

101 1001

59

89

+

010 1011

2B

43

Z

101 1010

5A

90

,

010 1100

2C

44

[

101 1011

5B

91

-

010 1101

2D

45

\

101 1100

5C

92

.

010 1110

2E

46

       

/

010 1111

2F

47

a

110 0001

61

97

0

011 0000

30

48

b

110 0010

62

98

1

011 0001

31

49

c

110 0011

63

99

2

011 0010

32

50

d

110 0100

64

100

3

011 0011

33

51

e

110 0101

65

101

4

011 0100

34

52

f

110 0110

66

102

5

011 0101

35

53

g

110 0111

67

103

6

011 0110

36

54

h

110 1000

68

104

7

011 0111

37

55

       

8

011 1000

38

56

y

111 1001

79

121

9

011 1001

39

57

z

111 1010

7A

122

Table A2.3: Some commonly used ASCII codes

A non-standard 8-bit version of the ASCII codes is also used, which means that 256 characters may be represented. This allows Greek letters, card suits, line drawing and graphic characters to be represented. The 8-bit version is usually referred to as extended ASCII. When standard 7-bit ASCII codes are used, the 8th bit is available for other uses. One such use is as a parity bit. A parity bit is used for error detection. When data is transmitted over long distances, perhaps using telephone lines, there is a possibility that the data will be corrupted due to electrical noise. This means that a bit may be flipped, i.e. a 1 bit gets changed to a 0 bit or vice versa. A parity bit can give some protection to allow you detect that such corruption has occurred, i.e. a bit has been changed from 1 to 0 or 0 to 1. There are two parity schemes called even parity and odd parity. In an even parity scheme, the parity bit is used to ensure that the code for each character contains an even number of 1’s. Thus, the parity bit would be set to 0 in the case of the letter ‘A’ whose ASCII code 100 0001 and the character would be transmitted as 0100 0001. The parity bit would be set to 1 in the case of the letter ‘B’ whose ASCII code is 100 0011 in order to make the number of 1’s even and the character would be transmitted as 1100 0011. In an odd parity scheme, the parity bit is used in the same fashion, except that it is set to 0 or 1 in order to make the number of 1’s transmitted, odd. Thus ‘A’ would be transmitted as 1100 0001 and ‘B’ as 0100 0011, if using an odd parity scheme.

When using parity bits, the receiver of a character can detect a single bit error, by computing the parity bit for the other 7 data bits and comparing it with the actual parity bit transmitted. If they are not the same, then an error has occurred. Parity checking cannot detect the corruption of a number of bits in the same byte, but this is relatively rare. Other methods may be used to detect such errors and they are studied in the field of data communications. The 8086 has conditional jump instructions for testing parity (jpo to jump on odd parity and jpe to jump on even parity).

ASCII is not the only such standard for representing information. IBM mainframe computers use EBCDIC (Extended Binary Coded Decimal Interchange Code) codes which are 8-bit codes, different from those used in the ASCII standard. In addition, work is in progress to provide a 16-bit standard code (referred to as Unicode) for representing characters. The problem with ASCII codes is that a maximum of 256 characters can be represented. While this is fine for handling text in the English language, it is useless for handling other languages such as Chinese or Japanese where there are literally thousands of individual characters making up the language alphabet. A 16-bit code allows in excess of 65,000 characters to be represented and so is sufficient for the alphabet of almost any language.

Exercises

A2.9 Look up the ASCII codes for the digits 0 - 9. What do you notice about the rightmost (low-order) 4 bits and the leftmost (high-order) 4 bits of each code?

A2.10 What is the numeric difference between the ASCII codes for any uppercase letter (e.g. ‘A’) and the corresponding lowercase letter (e.g. ‘a’)?

A2.4 Summary

In this appendix we have described how information is represented inside a computer system. We described how signed and unsigned numbers can be represented and we also discussed the use of ASCII codes for representing characters. Table A2.4 is a full listing of the ASCII codes.

A2.5 Reading List

As for Chapter 2 and 5.

Megarry, J. (1985) Inside Information: Computers, Communications and People, BBC, London.

Standard ASCII Codes

Char

Binary

Hex

Decimal

Char

Binary

Hex

Decimal

               

NUL

000 0000

00

0

SP

010 0000

20

32

SOH

000 0001

01

1

!

010 0001

21

33

STX

000 0010

02

2

"

010 0010

22

34

ETX

000 0011

03

3

#

010 0011

23

35

EOT

000 0100

04

4

$

010 0100

24

36

ENQ

000 0101

05

5

%

010 0101

25

37

ACK

000 0110

06

6

&

010 0110

26

38

BEL

000 0111

07

7

010 0111

27

39

BS

000 1000

08

8

(

010 1000

28

40

HT

000 1001

09

9

)

010 1001

29

41

LF

000 1010

0A

10

*

010 1010

2A

42

VT

000 1011

0B

11

+

010 1011

2B

43

FF

000 1100

0C

12

,

010 1100

2C

44

CR

000 1011

0D

13

-

010 1101

2D

45

SO

000 1110

0E

14

.

010 1110

2E

46

SI

000 1111

0F

15

/

010 1111

2F

47

DLE

001 0000

10

16

0

011 0000

30

48

DC1

001 0001

11

17

1

011 0001

31

49

DC2

001 0010

12

18

2

011 0010

32

50

DC3

001 0011

13

19

3

011 0011

33

51

DC4

001 0100

14

20

4

011 0100

34

52

NAK

001 0101

15

21

5

011 0101

35

53

SYN

001 0110

16

22

6

011 0110

36

54

ETB

001 0111

17

23

7

011 0111

37

55

CAN

001 1000

18

24

8

011 1000

38

56

EM

001 1001

19

25

9

011 1001

39

57

SUB

001 1010

1A

26

:

011 1010

3A

58

ESC

001 1011

1B

27

;

011 1011

3B

59

FS

001 1100

1C

28

<

011 1100

3C

60

GS

001 1101

1D

29

=

011 1101

3D

61

RS

001 1110

1E

30

>

011 1110

3E

62

US

001 1111

1F

31

?

011 1111

3F

63

 

 

 

Char

Binary

Hex

Decimal

Char

Binary

Hex

Decimal

               

@

100 0000

40

64

`

110 0000

60

96

A

100 0001

41

65

a

110 0001

61

97

B

100 0010

42

66

b

110 0010

62

98

C

100 0011

43

67

c

110 0011

63

99

D

100 0100

44

68

d

110 0100

64

100

E

100 0101

45

69

e

110 0101

65

101

F

100 0110

46

70

f

110 0110

66

102

G

100 0111

47

71

g

110 0111

67

103

H

100 1000

48

72

h

110 1000

68

104

I

100 1001

49

73

i

110 1001

69

105

J

100 1010

4A

74

j

110 1010

6A

106

K

100 1011

4B

75

k

110 1011

6B

107

L

100 1100

4C

76

l

110 1100

6C

108

M

100 1101

4D

77

m

110 1101

6D

109

N

100 1110

4E

78

n

110 1110

6E

110

O

100 1111

4F

79

o

110 1111

6F

111

P

101 0000

50

80

p

111 0000

70

112

Q

101 0001

51

81

q

111 0001

71

113

R

101 0010

52

82

r

111 0010

72

114

S

101 0011

53

83

s

111 0011

73

115

T

101 0100

54

84

t

111 0100

74

116

U

101 0101

55

85

u

111 0101

75

117

V

101 0110

56

86

v

111 0110

76

118

W

101 0111

57

87

w

111 0111

77

119

X

101 1000

58

88

x

111 1000

78

120

Y

101 1001

59

89

y

111 1001

79

121

Z

101 1010

5A

90

z

111 1010

7A

122

[

101 1011

5B

91

{

111 1011

7B

123

\

101 1100

5C

92

|

111 1100

7C

124

]

101 1101

5D

93

}

111 1101

7D

125

^

101 1110

5E

94

~

111 1110

7E

126

__

101 1111

5F

95

DEL

111 1111

7F

127

Table A2.4: Standard ASCII Codes: