Using Unicode with Vector Pascal
Paul Cockshott, Pascal Developer
Gang Gao, Pascal Developer




Using Unicode with Vector Pascal

by Paul Cockshott
10/1/02

Briefing

ISO Pascal is defined using an alphabet of symbols all of which can be represented with ASCII. Vector Pascal uses Unicode to permit a wider range of symbols to be used in programs.
 
Programs should be submitted to the compiler in UTF-8 encoded Unicode. Since the 7 bit ASCII is a subset of UTF-8, all valid ASCII encoded Vector Pascal programs are also valid UTF-8 programs.

Letter based Identifiers

ISO Pascal allows the Latin letters A-Z to be used in identifiers. Vector Pascal extends this by allowing letters from the Greek, Cyrillic, Katakana and Hiragana character sets.

Alphabet Position code
Greek low 0391
  high 03A9
Cyrillic low 0410
  high 042F
Katakana low 30A0
  high 30FA
Hiragana low 3041
  high 0394


Treatment of identifiers is case indifferent, in that upper case and lower case versions of a given letter are treated as equivalent. Identifiers drawn from these alphabets can be strings of letters or digits starting with a letter.

Ideographics Identifiers

Vector Pascal allows the use of Ideographs drawn from the unified Chinese, Japanese and Korean sets (Unicode range 4E00-9FFF) to act as identifiers.

Special Symbols

When using Unicode, certain mathematical operations that are encoded as a sequence of ASCII characters can be represented as a single Unicode character.

Operation

ASCII form

Symbol Unicode
Set membership

in

2208
Assignment

:=

2190
Integer division

div

00F7
Nary summation

\+

2211
Nary product

\*

220F
Square root

sqrt

221A
Less than or equal

<=

2264
Greater than or equal

>=

2265
Not equal

<>

2260
Negation

not

00AC
Logical and

and

2227
Logical or

or

2228
Multiplication

*

2715
Index generation

iota

2373

Example

The following shows the use of Unicode operators in place of the Ascii ones used on previous releases of Vector Pascal.

Program proddemo;
{ prints the product and square root of the integers 1..5 }
Var 
     a:array[1..5] of Integer; 
     Y:integer;
Begin
{ unicode version}
a 0; { form integers from 1 to 5 }
Writeln(a);
Y a; { get their product }
Writeln( y, y);
{ now using ascii }
a:= iota 0;
writeln(a);
y:= \* a;
writeln (y, sqrt(y));
End.

Characters

The built in char type in Pascal is represented with 16 bits in Vector Pascal. This allows any Unicode character to be handled.


Strings

Strings are held as arrays of char with a length word held in the first character. This is a simple extension of the mechanism used in Turbo Pascal. It potentially allows strings to be up to 65535 characters long. The type STRING written without a length specification stands for a string of length MAXSTRING.

Read

When reading strings or characters from a text file, conversion is automatically performed from utf-8 to Unicode format.

Write

When characters or strings are output to a text file, they are converted from the internal Unicode format to the utf-8 format.

Example

Here is an example program to print out a page of the Unicode character set. It illustrates the use of Unicode for variable names, Unicode within strings, and the manipulation of 16 bit characters.

Program printUnicode;
Procedure printpage(p: integer);
Var 
     :string; { Greek variable name } 
     c:char;
     i,j: integer;
Begin
writeln(p);
for i := 0 to 15 do
     begin
      := '-  -';  { unicode arrow symbol in string }
     for j := 0 to 15 do  { concatenate 16 bit char to string }
           :=  + chr(j + 16*i + 256*p);
     writeln();
     end;
End;
Var
     page: integer;

Begin
Write(' Unicode page: ');
Readln(page);
Printpage(page);
End.


Links

For more information on Vector Pascal:



Copyright © 2002 Paul Cockshott and Gang Gao. All Rights Reserved.