This introduction to StrongForth has been written for those who already have collected some experience with Forth. Although StrongForth is as close to ANS Forth as possible, it is not required that the reader has worked with an ANS compliant Forth system.
The basic idea behind StrongForth is the wish to add strong static type checking to a Forth system. Previous Forth systems and standards (including ANS) were supposed to be typeless or untyped, which means they do not do any type checking at all. The interpreter and the compiler generally accept any word to be applied to the operands on the data and return stack. This behaviour grants total freedom to the programmer, but on the other side it is rather often a reason for type errors, which frequently cause system crashes and other more or less strange behaviour throughout the whole development phase.
StrongForth does not guarantee bug-free programs. It does not even grant the absence of crashes. But type errors will be greatly reduced. Furthermore, since interpreter and compiler know about the data types of the operands on the stack, they are able to chose the appropriate version of a word, if the dictionary contains several words with the same name, but different input parameter types. This is called operator overloading. As will be shown in this introduction, operator overloading allows a much more comfortable way of programming. Additionally, it reliefs the you from inventing individual names for words with the same semantics, but different data types.
Of course, strong static typing has some drawbacks, which might keep traditional Forth programmers from using it. First, it requires a higher degree of discipline, because all words having stack-effects have to be provided with precise stack diagrams. Second, interpreter and compiler will prohibit not only dirty tricks, but sometimes also just unusual operations. For example, adding a flag to an address is not possible, although it might be useful in some cases. And third, relying on a system that does all the type-checking itself, might lead to more careless programming.
Nevertheless, the advantages and disadvantages of strong static type checking have already been discussed in the Forth community. The availability of StrongForth will certainly put more practical aspects into the previously rather theoretical discussion, allowing you to simply try it out by yourself.
Let's begin with a few examples out of the first chapter of Leo Brodie's famous textbook Starting Forth:
15 SPACES OK
When interpreting the number 15, the interpreter pushes this value on the data stack and remembers that it is an unsigned single integer. SPACES is a word that requires an unsigned single integer as input parameter. Here's a possible definition of SPACES:
: SPACES ( UNSIGNED -- ) 0 ?DO SPACE LOOP ;
Well, this is not very exciting. At a first look, the only more or less interesting thing about it is the stack diagram. Standard Forth systems use ( n -- ), which is nothing but a comment. In StrongForth, it is interpreted source code, which compiles the stack diagram of SPACES into the dictionary. Additionally, it tells the compiler, that the definition starts with an item of data type UNSIGNED on the data stack, and is expected to remove this item on exiting. Generally, each word in the dictionary includes full information about its stack effect.
So let us now try a second example:
42 EMIT * OK
EMIT is a word that expects a number on the stack and displays the ASCII character associated with this number. We can also write
CHAR * EMIT * OK
instead, because a character is some kind of a number. Even the following code works well:
CHAR * . * OK
But wait ... Isn't . supposed to display a number, and not a character? Let's see:
42 . 42 OK
Yes, this still works. But how does . know whether it should print a number or an ASCII character? StrongForth actually provides more than one version of .. There's one version for displaying numbers, and there's one version for displaying characters. The interpreter and the compiler take care of selecting the version that is suited best for the purpose. In this case, a number is displayed as a number, and a character is displayed as a character. When we write 42, the interpreter pushes 42 onto the data stack and keeps in mind that this is a number. When we write CHAR *, the interpreter pushes exactly the same value onto the stack, but this time it makes a note that the item on top of the stack is a character. This note later allows the interpreter to select the correct version of .. EMIT doesn't make this difference. It displays each and every parameter as an ASCII character.
There are several other versions of . in StrongForth's dictionary. Just have a look at these:
3 4 = . FALSE OK -16 . -16 OK
In this example, = takes the two items of data type UNSIGNED and returns an item of data type FLAG. A special version of . for flags delivers the appropriate result. The second example seems to be straight-forward, but it is not. Remember that 15, 42, 3 and 4 produced items of data type UNSIGNED. -16 produces an item of data type SIGNED, and the interpreter finds a version of . suited for signed numbers. To enter a positive signed number, you have to precede it with a sign, for example +16. The advantage of distinguishing between signed and unsigned numeric literals becomes obvious when we try larger numbers:
60000 . 60000 OK +60000 . -5536 OK
A standard 16-bit Forth system would always display -5536, because it can not distinguish signed and unsigned numbers.
With the knowledge obtained so far, let's try out the compiler, still sticking to the examples in Leo Brodie's Starting Forth:
: STAR [CHAR] * . ; OK STAR * OK CR OK CR STAR CR STAR CR STAR * * * OK : STARS 0 DO STAR LOOP ; (DO) ? undefined word UNSIGNED
Oops. What's that? DO tried to compile (DO), which expects two numbers of the same data type on the stack, but there was only one. Thus, the compiler could not find an appropriate version of (DO) in the dictionary, and throws an exception. Yes, we have to supply a stack diagram to STARS:
: STARS ( UNSIGNED -- ) 0 DO STAR LOOP ; OK 5 STARS ***** OK STARS STARS ? undefined word
So, the compiler starts with an UNSIGNED on the stack, adds another one (0), and now (DO) gets its input parameters. The last line just shows that STARS will itself not be found in the dictionary, if the stack is empty.
Finally, let's complete Leo Brodie's example:
: MARGIN CR 30 SPACES ; OK : BLIP MARGIN STAR ; OK : BAR MARGIN 5 STARS ; OK : F BAR BLIP BAR BLIP BLIP CR ; OK F ***** * ***** * * OK
In the previous section, we have introduced four data types: UNSIGNED, SIGNED, CHARACTER and FLAG. Actually, StrongForth knows a lot more data types, and it is even possible to define new, application-specific data types.
Having several different data types is certainly useful, but a large, unstructured quantity of data types would cause a serious problem. Since it should be possible to apply words like DUP and DROP to every data type, it would be necessary to supply a separate version of these words for each data type. Words with two input parameters, like SWAP, would have to be defined for each possible combination of data types, which makes already 1444 versions for 38 data types! ROT would be even worse.
To solve this problem, StrongForth arranges all data types in a hierarchical structure. There are two data types at the root of this hierarchy, SINGLE and DOUBLE. All other data types are subtypes of SINGLE or DOUBLE, or of other subtypes. The complete data type structure looks like this:
SINGLE | +-- INTEGER | | | +-- UNSIGNED | | | +-- SIGNED | | | +-- CHARACTER | +-- ADDRESS | | | +-- DATA | | | +-- CONST | | | +-- CODE | | | +-- PORT | | | +-- CADDRESS | | | +-- CDATA | | | +-- CCONST | | | +-- CCODE | | | +-- CPORT | +-- LOGICAL | | | +-- FLAG | +-- TOKEN | +-- MEMORY-SPACE | +-- FILE | +-- WID | +-- R-SIZE DOUBLE | +-- INTEGER-DOUBLE | | | +-- UNSIGNED-DOUBLE | | | | | +-- NUMBER-DOUBLE | | | +-- SIGNED-DOUBLE | +-- CONTROL-FLOW | | | +-- ORIGIN | | | | | +-- LOOP-ORIGIN | | | | | +-- OF-ORIGIN | | | | | +-- ENDOF-ORIGIN | | | +-- DESTINATION | | +-- DATA-TYPE | | | +-- STACK-DIAGRAM | +-- FAR-ADDRESS | | | +-- CFAR-ADDRESS | +-- DEFINITION | +-- COLON-DEFINITION TUPLE | +-- INPUT-SOURCE
Whenever the interpreter or compiler tries to find a word in the dictionary, it accepts not only a word whose input parameters match the data types of the items on the stack exactly, but also a word whose input parameters are parents of those. Thus, only two versions of DUP and DROP are required: one for SINGLE and one for DOUBLE. If, for example, the item on top of the data stack has data type UNSIGNED, DUP for SINGLE would match, because UNSIGNED is a (second-generation) subtype of SINGLE. Similarly, four versions of SWAP and eight versions of ROT (instead of 38³ = 54872) are enough:
SWAP ( SINGLE SINGLE -- ) SWAP ( SINGLE DOUBLE -- ) SWAP ( DOUBLE SINGLE -- ) SWAP ( DOUBLE DOUBLE -- ) ROT ( SINGLE SINGLE SINGLE -- ) ROT ( SINGLE SINGLE DOUBLE -- ) ROT ( SINGLE DOUBLE SINGLE -- ) ROT ( SINGLE DOUBLE DOUBLE -- ) ROT ( DOUBLE SINGLE SINGLE -- ) ROT ( DOUBLE SINGLE DOUBLE -- ) ROT ( DOUBLE DOUBLE SINGLE -- ) ROT ( DOUBLE DOUBLE DOUBLE -- )
Well, these are already a lot of versions for ROT, but remember that only two of these eight versions, the first and the last, are defined in ANS Forth (the last one is actually 2ROT). And finally, ROT is one of very few words in StrongForth having so may different versions.
Now, let's have a closer look at the data type structure. Some of the data types seem familiar to those explicitly specified in ANS Forth: UNSIGNED is u, SIGNED is n and CHARACTER is char. These three data types are subtypes of data type INTEGER, and INTEGER itself is a direct subtype of SINGLE. INTEGER is rarely used explicitly, but it is most useful as a common parent to the three data types. For example,
ALLOT ( INTEGER -- )
can be applied to items of all three data types, without having to define separate versions, but it may not be directly applied to addresses or flags. An even better example might be
+ ( INTEGER INTEGER -- INTEGER )
but this is actually defined in a different way, as will be explained later.
An ADDRESS is not the same as an INTEGER, because an address may not be added to another address (only subtracted, giving an INTEGER). There are several other restrictions, like multiplication, but also some special features that only apply to addresses.
StrongForth has been designed for usage in embedded systems. Other than a modern PC, embedded systems are usually not equipped with 32- or 64-bit processors, gigabytes of RAM, and a mass storage device to download programs and data from. Typical embedded controllers operate with 8- or 16-bit processors, and do not have more than a few kilobytes of RAM. Because they lack a hard disk, program code and constant data are stored in ROM, EPROM, EEPROM, OTP or flash memory. Thus, the whole memory space is divided into areas with different physical properties. Furthermore, many 16-bit microcontrollers have a banking or segmenting mechanism to be able to deal with an address range of more than 65536 bytes, although still using 16-bit addresses. This is the main reason why StrongForth supports different memory areas.
ADDRESS has five subtypes: DATA, CONST, CODE, PORT and CADDRESS. The last one has itself several subtypes, which will be described later. DATA is assumed to be an address within RAM. A microcontroller with a banking or segmenting mechanism will always use its data bank or data segment register when accessing a DATA address. CONST is an address that points to a memory area, which is assumed to be read-only by the application. Nevertheless, it might be possible to write into this memory area during the development phase, when it is emulated by RAM, or by writing once into EPROM, OTP or flash memory. In StrongForth, all virtual machine code and all data constants lie in the CONST memory area. But the CONST memory area does not include the whole read-only or write-once memory. Other parts of read-only memory are reserved for the CODE memory area and for the name space of the dictionary. An address of data type CODE points always to machine code. When accessing machine code, the microcontroller uses its code bank or code segment register. The name space is not located in one of the memory areas. It can only be accessed by words that handle definitions, like CREATE and >BODY. Finally, PORT addresses point to the microcontroller's I/O register area.
That was a lot of theory. So let's try some examples now. One of the many ANS Forth words returning an address is BASE:
DECIMAL BASE @ . 10 OK
Okay, that worked as expected. Of course, the address returned by BASE has to be of data type DATA, because it obviously has to reside in RAM. But ... how does @ know that the address on top of the data stack is the address of an unsigned single number? Obviously, the interpreter chose the correct version of . to display an unsigned number. Let's try something else:
STATE @ . FALSE OK
Same question: How does @ know ... ? The easiest way to get an answer is to get acquainted with StrongForth's version of .S:
45 -3 TRUE CHAR B .S UNSIGNED SIGNED FLAG CHARACTER OK
Surprise, surprise! Instead of displaying the data values, .S shows the data types of the items on the stack. Well, what else did you expect from a strongly typed system? The information on data types is in many cases more useful than the actual numerical values. Now things are getting exciting:
QUIT \ CLEAN STACK BASE .S DATA -> UNSIGNED OK
What does that mean? UNSIGNED, FLAG, CHARACTER, DATA and so on are so-called basic data types. DATA -> UNSIGNED is a compound data type, meaning a data address pointing to an unsigned single number. Since StrongForth is strongly typed, addresses have to be specific in the sense that addresses to different data types have to be distinguishable. The rest is easy to understand:
@ .S UNSIGNED OK . 10 OK
When @ is supplied with an item of data type DATA -> UNSIGNED, it knows that is has to fetch a single cell from RAM memory and return it as UNSIGNED..
But what about variables? An ANS Forth variable is typeless. It can store anything from a signed number to an execution token. In StrongForth, the word VARIABLE has to be supplied with information about the data type that is supposed to be stored in it. This information can easily be supplied by doing a small modification to the semantics of VARIABLE. In StrongForth, VARIABLE initializes the just created variable with the value of the item on top of the data stack, while simultaneously taking over its data type:
CHAR C VARIABLE X OK X .S @ . DATA -> CHARACTER C OK
An item to be stored into a variable must always have exactly the same data type as the one with which the variable had been initialized:
CHAR D X ! OK X @ . D OK -13 X ! -13 X ! ? undefined word SIGNED DATA -> CHARACTER
The error message means that the interpreter cannot find a word with the name ! that accepts the two input parameters SIGNED and DATA -> CHARACTER. Note that the second line of an error message always displays the data types of the items on the data stack at the time the error was detected.
To show how powerful the concept of compound data types is, let's continue playing with variables:
X VARIABLE Y OK Y .S DATA -> DATA -> CHARACTER OK @ .S DATA -> CHARACTER OK @ .S CHARACTER OK . D OK
Thus, a compound data type can consist of an arbitrary number of basic data types chained by ->. It is therefore possible to store addresses of specific items in variables and generally operate with addresses of addresses of addresses and so on.
All things shown here for DATA addresses can also be applied to CONST and CODE addresses, without using uniquely named words. @, !, +! and several other words are available in different versions for all subtypes of ADDRESS, except for data type PORT. Of course, VARIABLE always creates a DATA address. Register I/O can be accomplished after defining @ and ! for addresses of data type PORT. These versions are not included in StrongForth, because simple register access is prohibited by most multitasking operating systems. For embedded systems, appropriate definitions can easily be be added. Port addresses can be defined as constants, although this requires a type cast:
HEX 2C DECIMAL CAST PORT -> SIGNED CONSTANT A/D OK
There is still one direct subtype of ADDRESS left, which has not been explained so far. CADDRESS has itself four subtypes, whose names are similar to the four direct subtypes of ADDRESS: CDATA, CCONST, CCODE and CPORT. These are all addresses that point to items of character size. DATA -> SINGLE and DATA -> DOUBLE are data addresses of single and double cell items, respectively. SINGLE and DOUBLE stand here for themselves as well as for any of their respective subtypes. To allow fetching and storing characters from and to character sized memory locations, without having to remember each time to use the ANS Forth words C@ and C! for this purpose, an address of data type CDATA tells the interpreter and the compiler that it points to a character sized item. C@ and C! do not exist in StrongForth. Instead, special versions of @ and ! dealing with CDATA addresses are provided.
CDATA ist not only useful in connection with characters, but also with other integers or with logical values. A large array of integers can thus efficiently be stored in memory as character size values, provided all elements of the array can be represented by 8 bits. With addresses, this would obviously not work. Furthermore, special version of @ for the data types CDATA -> SIGNED and CDATA -> FLAG are supplied, which do the proper sign extension.
A good example of CDATA addresses can be constructed with PAD and TYPE:
CHAR F PAD ! OK CHAR O PAD 1+ ! OK CHAR H PAD 4 + ! OK CHAR T PAD 3 + ! OK CHAR R PAD 2 + ! OK PAD .S CDATA -> CHARACTER OK 5 TYPE FORTH OK
Similarly, CCONST, CCODE and CPORT allow fetching and storing character size values in the CONST and CODE memory areas and in the I/O register area, respectively. Especially CPORT can be very useful, because many peripheral registers on 16- and 32-bit machines are still only 8 bits long.
ADDRESS and CADDRESS will normally not be used as data types of real items, but they are required in the stack diagrams of words doing address arithmetic, which can be applied to different kinds of addresses. 1+ is a good example. It can be applied to items of data type INTEGER and to items of data type ADDRESS, including all subtypes.
An item of data type LOGICAL is just a collection of individual bits in a single cell. Naturally, such items may not be involved in arithmetic operations like +, -, *, / or NEGATE. On the other hand, logical operations like AND, OR, XOR and INVERT may only be applied if the top item on the stack is of data type LOGICAL:
HEX 1234 55AA OR . HEX 1234 55AA OR ? undefined word UNSIGNED UNSIGNED HEX 1234 55AA CAST LOGICAL OR . 57BE OK
Performing a logical operation on an integer is not necessarily dangerous, but you have to pay attention, because it is a programming trick. Doing the same operation with /, MOD and /MOD is in most cases clearer, because these operations fit better to the nature of integers. For example, lets assume >IN contains an offset to a block of 16 lines, each 64 characters long. Now we want to return to the beginning of the current line. Here are two alternatives, that both work well in StrongForth:
: RETURN >IN @ -64 CAST LOGICAL AND >IN ! ; OK : RETURN >IN @ DUP 64 MOD - >IN ! ; OK
Considering that the type cast does not compile any code, the first version is probably faster. Now let's assume the line length is an unsigned constant with the value 64:
: RETURN >IN @ C/L NEGATE CAST LOGICAL AND >IN ! ; OK : RETURN >IN @ DUP C/L MOD - >IN ! ; OK
This will still work perfectly in both cases. Unless we decide to set the line length to 72, or any other number that is not a power of 2 ...
Of course, LSHIFT and RSHIFT are also words that can only be applied to items of data type LOGICAL. Integers should use *, /, 2* and 2/. Again: logical operations are for logicals, arithmetic operations are for integers.
Finally, FLAG is a subtype of LOGICAL. This secures that all logical operations can directly be applied to flags, i. e., without the ugly type casts. A FLAG is just a LOGICAL with all bits set to the same value.
Items of data type TOKEN are actually execution tokens, which are abbreviated with xt in ANS Forth. This data type is a direct subtype of SINGLE, in order to prevent any operations other than the few allowed ones to be applied to execution tokens. Although execution tokens are implemented as CONST addresses, it would certainly not be portable to fetch the contents of that address or to add something to the address, because in other implementations execution tokens could be code addresses, or even indexes to a table.
Since the only word, which has an item of data type TOKEN as input parameter, is (EXECUTE), and this is an internal word which is not supposed to be used directly, TOKEN can be considered being an abstract data type. Further explanations on how to deal with execution tokens will be given later in connection with the description of EXECUTE.
As already explained, the physical memory of a typical embedded system is divided into several areas: RAM, ROM, EPROM, EEPROM, OTP, flash memory and the I/O area (if applicable). To be able to run on embedded systems, StrongForth's logical memory is additionally divided into so-called spaces. The I/O area is not considered as memory, although it can be addressed by PORT accessed. There are five memory spaces:
DATA-SPACE CONST-SPACE CODE-SPACE NAME-SPACE LOCAL-SPACE
Three of these memory spaces are already known in ANS Forth: DATA-SPACE, CODE-SPACE and NAME-SPACE. CONST-SPACE is an alternative data space. The DATA-SPACE is always allocated in RAM, while the CONST-SPACE maps into non-volatile memory, which can typically not or only once been written to. Virtual machine code and data constants are always compiled into CONST-SPACE.
LOCAL-SPACE is an alternative name space. The NAME-SPACE contains the names of the definitions, their stack diagrams and the dictionary links. It is allocated in non-volatile memory. LOCAL-SPACE is allocated in RAM, because is contains local variables and other things that last only as long as the current definition is being compiled. Actually, local variables build themselves a small dictionary in the LOCAL-SPACE. This dictionary is emptied when ; is executed.
The five words listed above are normal StrongForth words. They do not have any stack effect. Each of these words make the corresponding memory space the current memory space, to which the words HERE, ALIGN, ALLOT, UNUSED, , and C, are applied:
DATA-SPACE HERE . 1234 , HERE . 2894 2896 OK CONST-SPACE HERE . 10 ALLOT HERE . 6590 6600 OK
To change the current memory space and restore the original status later, two additional words are provided. SPACE@ returns the current memory space as an item of data type MEMORY-SPACE on the data stack. SPACE! restores the memory-space supplied as its input parameter. A typical sequence within a StrongForth definition that temporarily switches memory spaces would therefore be:
... SPACE@ CONST-SPACE ... SPACE! ...
It is simply the same technique that is used in connection with the number-conversion radix, if a word requires a specific numeric base, but should not change it permanently.
A small problem arises each time HERE is used. Since the current memory space is part of the system status, it is not possible to determine it at compile time. This results in HERE returning an item of data type ADDRESS instead of an item of one of the more specific data types DATA, CONST and CODE. ADDRESS can not be directly used as input parameter to @, ! and +!, because the physical memory area is not specified. The consequence is, that ADDRESS often has to be manually casted to one of the more specific address data types, provided you better than the compiler, what kind of address you are using. Note that this is not a general problem of strong static typing. It is a problem that comes along when supporting several different physical memory areas. On the other hand, a type cast will anyway be necessary in most cases, because @, ! and +! all require compound data types as input parameters. So be careful when using HERE and ALLOT, because they are weak points in an otherwise strongly typed system.
Data type FILE is used for file handles. It is directly derived from data type SINGLE, because this excludes arithmetic and logical operations to be applied to file handles. The value returned by SOURCE-ID has data type FILE as well. Two special values of this data type, 0 and -1, indicate that the input source is the user input device or a character string, respectively. All other values are real DOS file handles.
WID is an abbreviation for word list identifier. Although this term is first introduced in the ANS Forth Search-Order word set, items of data type WID are already used in StrongForth's Core word set. Even without the Search-Order word set, StrongForth knows three different word lists:
R-SIZE is a special data type which is only used by >R and R>. A detailed description follows later in connection with the explanation of the return stack and local variables.
All members of the DOUBLE branch of the data type structure occupy two cells on the data stack and in memory. This is not new to Forth. The big difference between StrongForth and ANS Forth regarding double numbers is the fact, that ANS Forth requires special names for those words that deal with double numbers, while StrongForth simply overloads the corresponding single number words. To duplicate two double numbers, one has to write 2DUP in ANS Forth and DUP in StrongForth. Adding two double numbers is done with D+ in ANS Forth and + in StrongForth, as can be seen in this example:
1000000. DUP + . 2000000 OK
. is overloaded as well. In ANS Forth, we would have to write D.. Overloading makes programming a lot easier. Actually, the complete StrongForth Double-Number word set consists of overloaded words. Since interpreter and compiler know about the data types of the items on the data stack, they will always select the proper words.
In analogy to single numbers, StrongForth provides the predefined data types INTEGER-DOUBLE, SIGNED-DOUBLE and UNSIGNED-DOUBLE. The number 1000000. in the above example is an UNSIGNED-DOUBLE. When prefixed with a positive or negative sign, it will be interpreted as SIGNED-DOUBLE.
A new data type is NUMBER-DOUBLE. It is only used between <# and #>, i. e., <# creates an item of this data type, while #> consumes it:
<# ( UNSIGNED-DOUBLE -- NUMBER-DOUBLE ) #> ( NUMBER-DOUBLE -- CDATA -> CHARACTER UNSIGNED )
This is an easy way to ensure that these two words are always paired. Since # and #S also work with items of data type NUMBER-DOUBLE, syntax violations will immediately be detected by the compiler. As an example, here is the definition of . for signed double numbers:
: . ( SIGNED-DOUBLE -- ) DUP 0< SWAP ABS <# #S SWAP SIGN #> TYPE SPACE ;
Note that SIGN, other than in ANS Forth, requires an item of data type FLAG as input parameter.
Using a special data type for securing the proper syntax is a common technique in StrongForth. The subtypes of data type CONTROL-FLOW, which is itself a subtype of DOUBLE, are other examples. An item of data type ORIGIN is created by IF and consumed by THEN. BEGIN creates an item of data type DESTINATION, which is later consumed by UNTIL or REPEAT. ELSE and WHILE may be used in exactly the same way as specified in ANS Forth. The subtypes of ORIGIN, namely LOOP-ORIGIN, OF-ORIGIN and ENDOF-ORIGIN, secure the proper syntax of DO ... LOOP and CASE ... OF ... ENDOF ... ENDCASE constructions.
Another subtype of DOUBLE is DATA-TYPE. An item of data type DATA-TYPE is, well, a data type. Words using items of this data type as input or output parameters are extensively used by the interpreter and the compiler. One of the most obvious applications of DATA-TYPE is NUMBER. Although NUMBER is not a word specified in ANS Forth, most Forth systems have more or less identical versions of it. In StrongForth, NUMBER looks like this:
NUMBER ( CDATA -> CHARACTER UNSIGNED -- INTEGER-DOUBLE DATA-TYPE )
The input parameters of NUMBER are the memory address of a character string and a character count, which can together be considered as representing a character string. Provided the character string contains a valid number, NUMBER returns its numerical value in an item of data type INTEGER-DOUBLE, and its data type in an item of data type DATA-TYPE. Depending on the character string, the latter one can be one of those: SIGNED, UNSIGNED, SIGNED-DOUBLE and UNSIGNED-DOUBLE. Remember that numbers consisting only of figures are UNSIGNED, while prefixing them with a sign makes them SIGNED. Appending a period makes a number being either UNSIGNED-DOUBLE or SIGNED-DOUBLE, depending on the presence of a sign. Let's try it:
PARSE-WORD -5 NUMBER . . SIGNED -5 OK PARSE-WORD 123456789. NUMBER . . UNSIGNED-DOUBLE 123456789 OK
PARSE-WORD simply gets the next space-delimited word from the input source and returns its address and character count, so NUMBER can directly be applied to its output parameters. But there's another interesting detail hidden in this example. Since the data types are displayed in a user-readable form, there must be an overloaded version of . that takes an item of data type DATA-TYPE as input parameter. This word is also used within the definition of .S.
DATA-TYPE has itself a subtype called STACK-DIAGRAM. Items of data type STACK-DIAGRAM pass on some tracking information during the creation of a stack diagram between ( and ), for example, the number of basic data types that have so far been added to the stack diagram, and whether input or output parameters are being generated. Although a stack diagram is not a data type, STACK-DIAGRAM has still been made a subtype of DATA-TYPE, because the internal structure of items of these two data types is quite similar.
Data types FAR-ADDRESS and CFAR-ADDRESS are addresses that include the memory bank or segment. They are not bound to predefined banks or segments, like addresses that are subtypes of ADDRESS. With items of data types FAR-ADDRESS and CFAR-ADDRESS, the whole address range of the processor can be accessed. In StrongForth, these kinds of addresses are required to access the name space, because the name space is not located in one of the predefined banks or segments, i. e., the DATA, CONST and CODE memory areas.
The final subtype of DOUBLE is DEFINITION. An item of data type DEFINITION is the identifier of a word in StrongForth's dictionary. It will be produced by words like ' and :NONAME, while words like >BODY expect one of it as input parameter. Other than in ANS Forth, ' and :NONAME do not directly deliver execution tokens. There are several reasons for this. The most important one is, that an execution token can not directly be executed, because it has no information about the stack diagram of the word associated with it. More information about this rather complicated subject will be supplied later in connection with a detailed explanation of EXECUTE. An item of data type COLON-DEFINITION identifies a colon definition.
For now, the most interesting thing about DEFINITION is, that the Programming-Tools word set provides another overloaded version of . for it. No, this is not the StrongForth synonym for SEE, but it displays the name and the complete stack diagram of a definition. Here are some examples:
' PARSE-WORD . PARSE-WORD ( -- CDATA -> CHARACTER UNSIGNED ) OK ' NUMBER . NUMBER ( CDATA -> CHARACTER UNSIGNED -- INTEGER-DOUBLE DATA-TYPE ) OK ' >BODY . >BODY ( DEFINITION -- CONST ) OK
According to the diagram of data types, data type TUPLE is not a subtype of SINGLE or DOUBLE. Does this mean that tuples are neither one nor two cells long? Correct. So, what size does a tuple have? Well, it depends. A tuple is a data type whose size may vary at runtime, and can thus not be determined at compile time. In fact, you can combine an arbitrary number of single-cell and double-cell items in one item of data type TUPLE, and handle them as an entity. The size of a tuple is an attribute depending on the number of cells that have been added to it.
There are only a small number of words that can be applied to tuples. You can create an empty tuple, add items to it, extract items from it, query its size, and delete it. Other operations may optionally be defined, but they are not included in StrongForth.
What are tuples good for? Primarily, they are required for implementing the ANS Forth words SAVE-INPUT, RESTORE-INPUT, GET-ORDER and SET-ORDER. The stack diagrams of these words contain the following sequence:
x1 ... xn n
StrongForth cannot deal with stack diagrams that consist of an arbitrary number of parameters. Using a tuple instead of this sequence resolves the problem, because a tuple can be represented by a single well-defined data type. The image of a tuple on the data stack is exactly the same as the above sequence, with an arbitrary number of cells and the count on top of the stack. Thus, the stack diagram of SAVE-INPUT and RESTORE-INPUT can easily be expressed in StrongForth:
SAVE-INPUT ( -- INPUT-SOURCE ) \ ANS: ( -- xn ... x1 n ) RESTORE-INPUT ( INPUT-SOURCE -- FLAG ) \ ANS: ( xn ... x1 n -- flag )
INPUT-SOURCE is a direct subtype of TUPLE. Using INPUT-SOURCE instead of TUPLE enforces that SAVE-INPUT and RESTORE-INPUT are always used in pairs.
When experimenting with displaying stack diagrams by using . for definitions, you might find out that ' always finds the most recent definition in the dictionary that matches the given name. Since many StrongForth words are overloaded, there typically exist multiple occurences of a name in the dictionary. This is a difference to ANS Forth. You can use WORDS for finding all overloaded versions of a name:
WORDS . . ( DEFINITION -- ) . ( DATA-TYPE -- ) . ( FLAG -- ) . ( CHARACTER -- ) . ( SIGNED -- ) . ( SINGLE -- ) . ( SIGNED-DOUBLE -- ) . ( DOUBLE -- ) OK
When experimenting with WORDS, you will almost certainly run into some rather strange stack diagrams that look like these:
WORDS DUP DUP ( DOUBLE -- 1ST 1ST ) DUP ( SINGLE -- 1ST 1ST ) OK
Looking again at the data type structure, you'll find out that 1ST is not one of the predefined data types, neither is 2ND, 3RD and TH in the following examples:
' >NUMBER . >NUMBER ( INTEGER-DOUBLE CDATA -> CHARACTER UNSIGNED -- 1ST 2ND 4 TH ) OK ' ACCEPT . ACCEPT ( CDATA -> CHARACTER INTEGER -- 3RD ) OK
So these words obviously must have a special meaning. Let's assume we define XDUP as follows and try it out on an unsigned single number:
: XDUP ( SINGLE -- SINGLE SINGLE ) DUP ; OK 4 XDUP .S SINGLE SINGLE OK
Now we have two items of data type SINGLE on the stack instead of two items of data type UNSIGNED. Trying, for example, to add those two items will fail, because + is only defined on INTEGER and ADDRESS, not on SINGLE. That's why we have to use 1ST in the stack diagram of DUP. When interpreting or compiling a word with 1ST as an output parameter, the data type of this parameter will be replaced by the data type of the first actual input parameter:
4 DUP .S . . UNSIGNED UNSIGNED 4 4 OK CHAR J DUP .S . . CHARACTER CHARACTER JJ OK BASE DUP .S . . DATA -> UNSIGNED DATA -> UNSIGNED 1656 1656 OK
Now it works as expected. As can be seen in the last line of the example, 1ST also works correctly if the first input parameter has a compound data type. 2ND and 3RD work in a similar way, but reference the second or third basic data type in the input parameter list, respectively. To reference the fourth, fifth, sixth basic data type (and so on), an unsigned number followed by TH has to be used, as in the stack diagram of >NUMBER. This feature is perhaps one of most important keys to strong static typing in StrongForth. Many words use 1ST, 2ND, 3RD and TH in their stack diagrams.
You might have noticed a small but important detail in the explanation of 2ND, 3RD and TH. They do not reference the second (or third ...) input parameter, but the second (or third ...) basic data type in the input parameter list of a stack diagram. The necessity for making this difference becomes clear when having a closer look at the stack diagrams of @:
WORDS @ @ ( CFAR-ADDRESS -> FLAG -- 2ND ) @ ( CFAR-ADDRESS -> SIGNED -- 2ND ) @ ( CFAR-ADDRESS -> SINGLE -- 2ND ) @ ( FAR-ADDRESS -> DOUBLE -- 2ND ) @ ( FAR-ADDRESS -> SINGLE -- 2ND ) @ ( CCODE -> FLAG -- 2ND ) @ ( CCODE -> SIGNED -- 2ND ) @ ( CCODE -> SINGLE -- 2ND ) @ ( CODE -> DOUBLE -- 2ND ) @ ( CODE -> SINGLE -- 2ND ) @ ( CCONST -> FLAG -- 2ND ) @ ( CCONST -> SIGNED -- 2ND ) @ ( CCONST -> SINGLE -- 2ND ) @ ( CONST -> DOUBLE -- 2ND ) @ ( CONST -> SINGLE -- 2ND ) @ ( CDATA -> FLAG -- 2ND ) @ ( CDATA -> SIGNED -- 2ND ) @ ( CDATA -> SINGLE -- 2ND ) @ ( DATA -> DOUBLE -- 2ND ) @ ( DATA -> SINGLE -- 2ND ) OK
Oops, that's quite a lot. Let's only look at the last one in the list. Although @ has only one input parameter, 2ND references SINGLE, or, more generally, the tail of the compound data type standing for the first input parameter. Thus, when @ is applied to a data address of an unsigned single number, the data type of the output parameter is really that of an unsigned single number. As has been shown in the previous examples with VARIABLE X and VARIABLE Y, it works as expected even if the tail of the referenced input parameter is itself a compound data type.
Another good example is >NUMBER, because this word has quite a lot of parameters:
' >NUMBER . >NUMBER ( INTEGER-DOUBLE CDATA -> CHARACTER UNSIGNED -- 1ST 2ND 4 TH ) OK
The first input parameter is of data type INTEGER-DOUBLE, the second is of data type CDATA -> CHARACTER and the third is of data type UNSIGNED. Only the second input parameter has a compound data type. When the input parameter list is decomposed into basic data types, we get:
1ST references the first basic data type, which is INTEGER-DOUBLE and nothing else. 2ND references CDATA. But since the basic data type CDATA in this input parameter list is the head of a compound data type, 2ND actually references the whole compound data type, namely CDATA -> CHARACTER. 3RD would reference the third basic data type, CHARACTER, which is the tail of the second input parameter. Finally, 4 TH references UNSIGNED. UNSIGNED is both the third input parameter and the fourth basic data type within the input parameter list.
Now it should be clear how several other words are defined. Have a look at the common arithmetic operators. As a general rule, the data type of the output parameter is the same as that of the first input parameter, thus allowing for example adding an integer to a character and still having a character on the stack afterwards. This should answer the question, why + is not defined as
+ ( INTEGER INTEGER -- INTEGER ) \ wrong!
but as
+ ( INTEGER INTEGER -- 1ST )
The most common application for data type references is in the output parameter list of stack diagrams. But data type references may also be used in the input parameter list, where they have a different meaning. Look at the stack diagrams of the various versions of !:
WORDS ! ! ( SINGLE CCONST -> 1ST -- ) ! ( DOUBLE CONST -> 1ST -- ) ! ( SINGLE CONST -> 1ST -- ) ! ( SINGLE CDATA -> 1ST -- ) ! ( DOUBLE DATA -> 1ST -- ) ! ( SINGLE DATA -> 1ST -- ) OK
Again, it's only the last line we shall investigate. 1ST means here, that the second input parameter is a data address, that points to an item of exactly the same data type as the first input parameter. This is actually a restriction to the interpreter or compiler when trying to find a suitable version of ! in the dictionary. It prevents you from trying to store something into a memory address, that doesn't belong there. A simple example might clarify this:
CHAR C VARIABLE X OK CHAR D X .S CHARACTER DATA -> CHARACTER OK ! OK 34 X .S UNSIGNED DATA -> CHARACTER OK ! ! ? undefined word UNSIGNED DATA -> CHARACTER
The second ! failed to match, because an unsigned single number may not be stored into a character variable.
To keep track of the data types on the data stack, StrongForth has two data type heaps. Why two? Isn't there just one data stack? Yes, but we need one data type heap for the interpreter and one for the compiler.
The contents of the interpreter's data type heap can be displayed with .S. That has already been explained in detail. The items on the data type heap are mapped one to one to the items on the data stack. If we have three items on the data stack, we also have three data types on the data type heap, which can be either basic or compound ddata types. Note that having three items on the data stack does not necessarily mean that they occupy three cells. One or more of them can be double numbers, so three items on the data stack can occupy between three and six cells. On the data type heap, three data types occupy a minimum of six cells, because DATA-TYPE is a subtype of DOUBLE. If one or more of them is a compound data type, the data type heap can even be higher. There is no fixed limit except the size allocated for the data type heap by the system.
The interpreter's data type heap is only used by the interpreter. There is no explicit typechecking at runtime, because this would cause a tremendous performance penalty. That's the main difference between systems with static and dynamic type checking. Instead of doing dynamic type checking at runtime, StrongForth's compiler does static typechecking at compile time. The compiler has its own data type heap, where it keeps the data types of the items, which will be on the data stack at runtime. Thus, the compiler data type heap at compile time maps to the data stack at runtime.
Since the interpreter is constantly present during compilation, having two separate data type heaps during compilation is a necessity. Immediate words generally use the interpreter data type heap, because they are immediately executed. All other words are compiled, and use the compiler data type heap. Let's view an example:
: TEST 3 4 .S UNSIGNED UNSIGNED + .S UNSIGNED . .S ; OK
.S is an immediate word. In interpretation state, it displays the contents of the interpreter data type heap. In compilation state, it displays the contents of the compiler data type heap, as in this example. After having compiled two numeric literals, the compiler data type heap contains two times the data type UNSIGNED. + is not immediate. The compiler finds a version of + that accepts two unsigned single integers, and compiles it. It also updates the compiler data type heap by replacing the data types corresponding to the input parameters of + with the data type that corresponds to +'s output parameter, which is UNSIGNED. . is also non-immediate. The compiler finds a version suitable for an unsigned single number and removes the data type of its input parameter from the compiler data type heap. Since . has no output parameter, the compiler data type heap is now left empty. ; is immediate. Before compiling EXIT, it checks that the contents of the compiler data type heap matches the assumed output parameter list of TEST. Both are empty, so there is no error.
Here's a second example:
: COUNTER ( UNSIGNED -- ) 0 DO I . LOOP ; OK 10 COUNTER 0 1 2 3 4 5 6 7 8 9 OK
By default, a new definition is assumed to have no stack effect. This time, we have specified an explicit stack diagram. ) initializes the compiler data type heap with one item UNSIGNED, so the compilation starts with this item. Compiling 0 adds another UNSIGNED, and DO consumes both by compiling (DO). I pushes UNSIGNED on the data type heap, and . consumes it. LOOP checks that the contents of the compiler data type heap is the same as it was after DO was executed, before compiling its own runtime semantics. Finally ; checks the congruence between the compiler data type heap and the output parameter list of COUNTER.
That's what happens on the compiler data type heap. But what about the interpreter data type heap? We can easily watch it with .S by temporarily switching to execute mode:
: COUNTER [ .S ] COLON-DEFINITION ( UNSIGNED -- ) [ .S ] COLON-DEFINITION 0 DO [ .S ] COLON-DEFINITION LOOP-ORIGIN I . LOOP [ .S ] COLON-DEFINITION ; OK
COLON-DEFINITION, which : pushes onto the interpreter data type heap, is the equivalent of what the ANS standard calls colon-sys. It identifies the current definition. DO pushes another item onto the data stack and the interpreter data type heap, which is supposed to contain information for LOOP or +LOOP. LOOP-ORIGIN is consumed by LOOP, and ; consumes COLON-DEFINITION. If we had tried to execute ; before LOOP, the interpreter would not have found it in the dictionary, because ; requires its input parameter COLON-DEFINITION to be on top of the stack.
This section is a summary of the main differences between ANS Forth and StrongForth, as far as they have not been mentioned yet. These are generally things a beginner should keep in mind when writing applications in StrongForth or when porting programs from ANS Forth to StrongForth.
Since the ANS Forth word ( is now used to initiate a stack diagram, it is no longer available for starting a comment. Operator overloading doesn't help here, because the ANS version of ( does not have any parameters that could distinguish the two words. But we are lucky. The ANS Forth Core Extensions word set specifies the word \ to skip the rest of the line.
\ has the same semantics in StrongForth, with one small extension: The comment ends before the end of the line, when a second \ is parsed. Of course, this is not exactly the same as the ANS Forth semantics. But it's pretty near. Using different words than ( and ) for stack diagrams, for example braces, would look even more unusual. Thus you can write:
3 4 + . \ this does not require a comment ' BASE >BODY -> CODE 1- \ get code field \ @ ,
In ANS Forth, several words exist in a standard version for signed numbers and in a modified version for unsigned numbers. The unsigned version is generally prefixed with an U. Since StrongForth allows operator overloading, this prefix is no longer required. The versions for signed and unsigned numbers have the same name, as in this example:
/ ( SIGNED SIGNED -- 1ST ) / ( UNSIGNED UNSIGNED -- 1ST )
The following words have been replaced by overloading already existing words: U., U.R, U<, U>, and UM*. Nevertheless, overloaded words for unsigned numbers are provided for several other words as well, like * and MOD, that do not have unsigned versions in ANS Forth. Have a look at StrongForth's dictionary to find out about that.
Address arithmetic is strongly restricted in StrongForth. It is generally not possible to perform multiplication, division or negation on addresses, because that yields almost always meaningless results. Neither is it allowed to add two addresses, or to add an address to an integer. So, the only kind of allowed addition in conjunction with addresses is to add an integer to an address.
It looks somewhat differently with subtraction. Of course, it is allowed to subtract an integer from an address, but it is also possible to subtract two addresses, giving an item of data type INTEGER:
- ( ADDRESS 1ST -- INTEGER )
To make sure that not two addresses from different memory spaces are subtracted, the second address has to have exactly the same data type as the first one.
Another interesting feature of StrongForth is the operand size adaption. Whenever an integer is added to an address that explicitly points to a single or double cell, the integer is automatically multiplied by the number of address units per single or double cell before the real addition takes place. It is for example no longer necessary to explicitly use CELLS before using an integer as an offset to an array of single numbers, as the following example demonstrates:
DATA-SPACE 3 VARIABLE ARRAY 9 , 27 , 81 , 243 , OK ARRAY @ . 3 OK ARRAY 1+ @ . 9 OK ARRAY 3 + @ . 81 OK
This feature does not only work with + and -, but with all words doing address arithmetic, like +!, +LOOP, 1+, 1-, and so on. It can easily be seen by having a look at all variants of a word:
WORDS + + ( CFAR-ADDRESS INTEGER -- 1ST ) + ( FAR-ADDRESS -> DOUBLE INTEGER -- 1ST ) + ( FAR-ADDRESS -> SINGLE INTEGER -- 1ST ) + ( FAR-ADDRESS INTEGER -- 1ST ) + ( INTEGER-DOUBLE SIGNED -- 1ST ) + ( INTEGER-DOUBLE UNSIGNED -- 1ST ) + ( INTEGER-DOUBLE INTEGER-DOUBLE -- 1ST ) + ( CADDRESS INTEGER -- 1ST ) + ( ADDRESS -> DOUBLE INTEGER -- 1ST ) + ( ADDRESS -> SINGLE INTEGER -- 1ST ) + ( ADDRESS INTEGER -- 1ST ) + ( INTEGER INTEGER -- 1ST )
Here, the different versions of + for data types ADDRESS, ADDRESS -> SINGLE and ADDRESS -> DOUBLE show, that these data types are not treated the same way. Note also that the version for plain addresses comes last. This is obviously the default, which only applies when the previous versions of + (those that were compiled later) failed to match the data types of the items on the parameter stack.
By the way, with this address arithmetic feature at hand, CELL+ and CHAR+ are no longer required. They have both been replaced by proper versions of 1+. A nice side effect is, that the semantics of 1- corresponds to CELL- or CHAR-. These words are not even specified in ANS Forth, but might be useful anyway.
The ANS Forth words 2!, 2@, 2DROP, 2DUP, 2OVER, 2SWAP and 2ROT have two different semantics. First, they perform the semantics of !, @, DROP, DUP, OVER, SWAP and ROT on double numbers. Second, they apply these words on pairs of single numbers. In StrongForth, double numbers are not just a pair of single numbers. Instead, double numbers are separate data types, which can generally not be interpreted as two single numbers.
The double number semantics of the above mentioned words is implemented in StrongForth by overloading the respective words for single numbers. That's obvious. Keeping 2!, 2@, 2DROP, 2DUP, 2OVER, 2SWAP and 2ROT for pairs of single numbers has been considered, but has finally been dismissed. They can easily be defined anyway.
Static typing requires, that the compiler knows about the precise stack effect of each word it compiles. Nevertheless, this necessity collides with some ANS Forth words, namely
The ANS Forth stack diagram of ?DUP is
( x -- 0 | x x )
The stack effect of ?DUP depends on its input parameter at runtime. But since the value of this input parameter is generally not known at compile time, the compiler wouldn't be able to continue after compiling ?DUP. As a result, ?DUP is not available in StrongForth. The most frequent application of ?DUP is immediately before IF, thus avoiding the ELSE branch when the input parameter is zero. So, the only penalty in StrongForth consists of having the add the ELSE branch in which the superfluous item is dropped..Another possible solution is to use ?IF as a direct replacement for ?DUP IF. It can be defined as follows:
: ?IF ( -- ORIGIN ) POSTPONE DUP POSTPONE 0= POSTPONE IF POSTPONE DROP POSTPONE ELSE ; IMMEDIATE
The case of PICK and ROLL is different. But again, since the compiler does not know the value of the input parameter at compile time, it has no means to determine the stack effect of these two words. Like ?DUP, these two words are not available in StrongForth. This is not considered being especially regrettable, because the necessity to get access to items buried deep down in the stack arises almost always as a result of bad factoring. If this can not be avoided, using local variables is a good replacement for messing around with PICK and ROLL.
In ANS Forth, the data types returned by ENVIRONMENT? depend on the contents of the query string. For a strongly typed system, ENVIRONMENT? definitely is not the ideal means to do an environment query. Nevertheless, ENVIRONMEMT? has been implemented in StrongForth. This is the stack diagram:
ENVIRONMENT? ( CDATA -> CHARACTER UNSIGNED -- CONST FLAG )
If the query string exists, FLAG is TRUE and CONST is the address of the value of the queried environmental parameter. After casting CONST to a more specifc data type, like CONST -> UNSIGNED for the /PAD environmental parameter, the actual value of the parameter can be fetched. If the query string does not exist, FLAG is false and CONST is undefined. Here's a small example:
PARSE-WORD /PAD ENVIRONMENT? . -> UNSIGNED @ . TRUE 84 OK PARSE-WORD /XYZ ENVIRONMENT? . . FALSE 11392 OK
FIND has been replaced by SEARCH-ALL, which has an unambiguous stack diagram. For details please see the separate section on FIND and SEARCH-ALL below.
According to the ANS Forth specification, SAVE-INPUT should create and RESTORE-INPUT should consume a variable-length sequence of stack cells, that contains information about its length and about the parsing position within the current input source Again, this is incompatible with the idea of strong static typing. As long as variable-length data types have not been introduced in StrongForth, a less flexible solution will be used. SAVE-INPUT and RESTORE-INPUT pass the information on the parsing position between each other as a fixed number of stack items, one of which has the double-cell data type INPUT-SOURCE.
Due to the potential dangers of direct return stack manipulations, the usage of >R, R@ and R> is strongly restricted in StrongForth. R@ actually is a local variable that is created by >R and removed by R>. Thus, >R and R> are immediate words, to be used only during compilation.
Furthermore, >R and R> have to be used pairwise with respect to control-flow structures like IF ... ELSE ... THEN, DO ... LOOP and BEGIN ... UNTIL. For example,
... IF ... >R ... THEN ... R> ...
is not possible, while
... IF ... >R ... R> ... THEN ...
as well as
... >R ... IF ... THEN ... R> ...
are allowed. To implement this restriction, >R leaves an item of a special data type, R-SIZE, on the stack, which is consumed by R>. A disadvantage of prohibiting any direct access to the return stack is, that some special techniques like backtracking can not be implemented in the usual easy (and non-portable) way.
Nevertheless, although >R, R@ and R> are present in StrongForth, their usage is discouraged in favour of local variables. Local variables are a much more powerful tool, especially since StrongForth provides single-cell as well as double-cell local variables. Local variables can be supplied with a name and do not need to be explicitly removed before exiting the definition. Furthermore, the contents of local variables can be changed any time by using TO. The only drawback is that local variables have to be definied in one block near the beginning of a definition. Performance and code size are approximately the same as that of >R ... R@ ... R>.
The advantages of local variables over direct return stack access has lead to the decision to implement not only R@, but also the loop indices I and J as local variables. This is the reason why they can not be found in StrongForth's dictionary. They are simply created dynamically as local variables and are automatically removed by R>, LOOP or +LOOP. By the way, this allows applying TO to R@, I and J.
Furthermore, implementing R@, I and J as local variables makes it possible to use >R, R@ and R> inside DO-loops:
... DO ... >R ... I ... R@ ... R> ... LOOP ...
as well as
... >R ... DO ... I ... R@ ... LOOP ... R> ...
work as expected, i. e., I always returns the loop index, while R@ always returns the item that was supplied to >R.
Finally, another rather nice consequence of implementing I and J as local variables has to be mentioned: Since all local variables are automatically removed from the return stack when exiting the currently compiled definition with EXIT or ;, it is no longer necessary to manually clean up the return stack before using EXIT within a DO loop. Thus, UNLOOP can be abandoned. Whoever means it should be provided for compatibility reasons, can easily define a dummy version:
: UNLOOP ;
ANS Forth supports two kinds of character strings. Strings represented by the address of its first character and the character count, and so-called counted strings. Since the use of counted strings is discouraged by the ANS standard, the decision has been made to get totally rid of them in StrongForth. Counted strings are still a part of ANS Forth for historical reasons. StrongForth has no history at all and is therefore not bound to existing programming techniques. Porting existing programs to StrongForth will anyway require several modifications to the source code, so the additional effort of exchanging counted strings seems to be tolerable.
Only a few words are affected:
However, StrongForth still knows two different types of strings: CDATA -> CHARACTER UNSIGNED and CCONST -> CHARACTER UNSIGNED. Several words, like for example EVALUATE, TYPE and MOVE, are supplied in two versions to be able to deal with both variable and constant strings.
FIND is not available in StrongForth. As a replacement, you have to use SEARCH-ALL:
SEARCH-ALL ( CDATA -> CHARACTER UNSIGNED SINGLE CODE -- DEFINITION SIGNED )
It is actually not easy to compare the stack diagram of SEARCH-ALL with the original ANS Forth stack diagram of FIND:
( c-addr -- c-addr 0 | xt 1 | xt -1 )
These are the differences:
EXECUTE is one of those words that produce stack effects which are normally not known at compile time. Simply removing EXECUTE is not an option, because this would deprive StrongForth of one of the most powerful features of Forth.
The StrongForth interpreter uses an internal word named (EXECUTE) to execute words. (EXECUTE) expects an execution token as input parameter. It does exactly what EXECUTE does in ANS Forth, i. e. it executes a word without taking regard of possible stack effects. But directly using (EXECUTE) in StrongForth would corrupt the data type system. What is required is a version of EXECUTE that has the runtime semantics of (EXECUTE) and takes care of the stack effects already at compile time. But this is difficult, because the runtime value of an execution token is generally not known at compile time. How can it be accomplished?
As usual, the data type system provides a solution. Although the compiler does not know the runtime value of the execution token, it should know the stack effect of the word associated with it. For each stack effect of the words being executed, a separate subtype of TOKEN and a separate version of EXECUTE has to be created. This is what )PROCREATES does:
( UNSIGNED -- FLAG )PROCREATES (U--F) OK LATEST PREV . (U--F) ( STACK-DIAGRAM -- 1ST ) OK LATEST . EXECUTE ( UNSIGNED (U--F) -- FLAG ) OK
In order to create execution tokens of data type (U--F), which can be executed with the specialized version of EXECUTE, you can use ?TOKEN:
DT (U--F) ?TOKEN 0= .S TOKEN OK 5 SWAP CAST (U--F) EXECUTE . FALSE OK
In this example, ?TOKEN tries to find an (overloaded) version of 0= that can be applied to the stack diagram associated with (U--F). ?TOKEN throws an exception if it doesn't find a word with the given name and a suitable stack diagram in the dictionary. After casting this token to the data type of the qualified token, it can be executed by EXECUTE. Note that the specialized versions of EXECUTE can be used during compilation as well.
The StrongForth word MOVE exists in several overloaded versions:
MOVE ( CFAR-ADDRESS -> SINGLE CDATA -> 2ND UNSIGNED -- ) MOVE ( FAR-ADDRESS -> DOUBLE DATA -> 2ND UNSIGNED -- ) MOVE ( FAR-ADDRESS -> SINGLE DATA -> 2ND UNSIGNED -- ) MOVE ( CONST -> DOUBLE DATA -> 2ND UNSIGNED -- ) MOVE ( DATA -> DOUBLE DATA -> 2ND UNSIGNED -- ) MOVE ( CCONST -> SINGLE CDATA -> 2ND UNSIGNED -- ) MOVE ( CONST -> SINGLE DATA -> 2ND UNSIGNED -- ) MOVE ( CDATA -> SINGLE CDATA -> 2ND UNSIGNED -- ) MOVE ( DATA -> SINGLE DATA -> 2ND UNSIGNED -- )
MOVE can copy single-cell, double-cell and character size items from the DATA or CONST memory area or from a full segment-and offset memory address to an address in the DATA memory area. Applying 2* on the count value when moving double cells is not required, because MOVE knows the size of the items to be moved. Since MOVE also allows moving character size items, it can replace CMOVE and CMOVE> from the ANS Forth String word set in most cases. These two words are therefore not included in StrongForth.
Other than ANS Forth, DEPTH delivers the height of the data type heap. More acurately, it deliveres the number of basic data types on the data type heap, with compound data types counting as two or more basic data types. STATE determines, whether the interpreter or the compiler data type heap is meant. Information about the data type heap is considered to be more important than information about the data stack, because it is possible to calculate the depth of the data stack from the contents of the data type heap, but not the other way around. Since the interpreter updates the data type heap before executing a word, the number DEPTH returns also counts itself, if DEPTH is actually executed by the interpreter. Here's an example:
7 BASE DEPTH .S UNSIGNED DATA -> UNSIGNED UNSIGNED OK . 4 OK
Several words from the ANS Forth Core Extension word set have not been implemented in StrongForth, because their use is discouraged. They are included in ANS Forth for compatibility reasons only. Compatibility with older versions is obviously not an issue in StrongForth. Thus, the following words do not exist in StrongForth:
Dr. Stephan Becher - October 8th, 2007