Chapter 5: Input and Output

Basic I/O

What is basic I/O? Well, on a typical system, input and output are based on services provided by the operating system or by some library functions. These functions usually don't handle any formatting. They just read and write characters and strings from designated devices like a keyboard, a display or even a serial interface device. ANS Forth specifies the following words for these purposes:

I/O function	Word(s)
character input	`KEY`
string input	`ACCEPT` and `EXPECT`
character output	`EMIT`
string output	`TYPE`

Other I/O words, like REFILL, ., or SPACES, are defined by using basic I/O words, and are thus considered as higher-level I/O. All of these basic I/O words, except for EXPECT, which is declared as being obsolete by the ANS Forth specification, are also available in StrongForth.

Obviously, KEY returns an item of data type character:

KEY ( -- CHARACTER )

String input can be done by using ACCEPT, which works exactly as in ANS Forth:

ACCEPT ( CDATA -> CHARACTER INTEGER -- 3RD )

Note that the second parameter, the maximum length of the character string to be received, is of data type INTEGER, although you might have expected this to be UNSIGNED. But ANS Forth clearly specifies that this parameter is a signed positive integer, because this allows certain Forth implementations playing tricks when zero or a negative number is provided here. Using INTEGER instead of UNSIGNED is just a compatibility issue. Anyway, it's still possible to provide an unsigned number as parameter for ACCEPT, because UNSIGNED is a subtype of INTEGER.

The semantics of StrongForth's version of EMIT is also the same as in ANS Forth:

WORDS EMIT
EMIT ( INTEGER -- )
 OK
42 EMIT
* OK

However, EMIT is rarely used. This is because StrongForth provides an overloaded version of . for displaying items of data type CHARACTER:

: . ( CHARACTER -- )
  EMIT ;

In ANS Forth, . can only be used for pictured numeric output. Overloading makes it possible to provide individual versions of . for each data type. You will see later in this chapter, that StrongForth has indeed a rather comprehensive set of output words named . for all kinds of data types. Whenever . is applied to an item of data type CHARACTER, it simply performs the semantics of EMIT:

5148 .S .
UNSIGNED 5148  OK
CHAR % .S .
CHARACTER % OK

However, this rule does not apply to character strings. The reason is that character strings are constitued by two items, an address and a character count:

PARSE-WORD StrongForth .S
CDATA -> CHARACTER UNSIGNED  OK
TYPE
StrongForth OK

If . were be used instead of TYPE, it would just look at the item on top of the stack and print an unsigned number, leaving the character address on the stack. StrongForth is not able to distinguish two items that constitute a character string from two separate items, one being a character address and the other being an unsigned number.

Actually, StrongForth provides more than one version of TYPE:

WORDS TYPE
TYPE ( CFAR-ADDRESS -> CHARACTER UNSIGNED -- )
TYPE ( CCONST -> CHARACTER UNSIGNED -- )
TYPE ( CDATA -> CHARACTER UNSIGNED -- )
 OK

Character strings may be located in the DATA memory area, in the CONST memory area and in any non-predefined memory area, which is accessed by full addresses. It is assumed that character strings are usually not located in the CODE memory area.

Other Output Words

Based on the basic output words EMIT (for characters) and TYPE, some higher-level output words are defined. CR advances the cursor to the next line, printing a carriage return and a line feed. SPACE prints a space character:

: CR ( -- )
  13 EMIT 10 EMIT ;

: SPACE ( -- )
  BL EMIT ;

BL is the space character, defined as a character constant:

32 CAST CHARACTER CONSTANT BL

Note that the ASCII value of the space character, 32, needs to be casted to an item of data type CHARACTER. If this cast were omitted, BL would leave an unsigned number instead of a character on the stack. With EMIT, of course, this makes no difference, but with the overloaded word ., it does:

32 CONSTANT NO-BL
 OK
NO-BL EMIT
  OK
NO-BL .
32  OK

SPACES expects an item of data type INTEGER on the stack, because this data type covers both signed and unsigned numbers:

: SPACES ( INTEGER -- )
  CAST SIGNED +0 MAX +0 ?DO SPACE LOOP ;

However, the value of the operand is interpreted as a signed number. Note the usage of +0 instead of just 0 in the definition of SPACES. +0 is of data type SIGNED, while 0 is of data type UNSIGNED. MAX expects two items with identical data types on the stack, i. e., it can not be found in the dictionary if UNSIGNED and SIGNED were the data types of the two items on top of the stack.

Finally, here are the definitions of the two state-smart immediate words ." and .(, which work both in interpretation state and in compilation state:

: ." ( -- )
  STATE @
  IF POSTPONE " POSTPONE TYPE
  ELSE [CHAR] " PARSE TYPE
  THEN ; IMMEDIATE

: .( ( -- )
  STATE @
  IF [CHAR] ) PARSE POSTPONE SLITERAL POSTPONE TYPE
  ELSE [CHAR] ) PARSE TYPE
  THEN ; IMMEDIATE

With one exception, these two definitions are identical to the respective ANS Forth definitions. The exception is the usage of " instead of the ANS Forth word S". This reflects the fact that counted strings are not supported by StrongForth, thus elimination the need to distinguish between C" and S". In StrongForth, " has the semantics of the ANS Forth word S". More details will be given later.

Pictured Numeric Output

The Transient Area

The transient area is the central data structure used for pictured numeric output. Numbers to be printed are converted digit by digit into a character string, which is located in the transient area. Finally, this string is displayed.

This is a very common technique in Forth. In StrongForth, the transient area is a dedicated, 34 bytes long field within the DATA memory area. As a consequence, pictured numeric output conversions will normally not be affected by other operations.

For accessing the transient area, StrongForth provides two constants:

TRANS-BOTTOM ( -- CDATA -> CHARACTER )
TRANS-TOP ( -- CDATA -> CHARACTER )

mark the lowest address and the highest address plus 1 of the transient area. You can easily store strings into the transient area:

PARSE-WORD StrongForth TRANS-BOTTOM SWAP MOVE
 OK
TRANS-BOTTOM 11 TYPE
StrongForth OK

But since the main purpose of the transient area is to do pictured numeric output conversion, it's usage as a long-term string buffer is discouraged. Any strings might be overwritten during the next pictured numeric output conversion. As a short-term string buffer, the transient area may savely be used, provided that the length of the string is less than 34 characters and no pictured numeric output is done during the lifetime of the string.

The only application of the transient area apart from pictured numeric output is implemented in the word TRANSIENT. TRANSIENT copies a string containing up to 34 character from a location in the CONST memory area into the transient area. This is necessary, if a string needs to be modified, or if another word expects a string in the DATA memory area as its input parameter. These strings are typically names of words, which are always less than 32 characters long. Here's the definition of TRANSIENT:

: TRANSIENT ( CCONST -> CHARACTER UNSIGNED -- CDATA -> 2ND 3RD )
  TUCK TRANS-BOTTOM SWAP MOVE TRANS-BOTTOM SWAP ;

For example, if SEARCH-ALL (StrongForth's replacement for the ANS Forth word FIND) shall be used to find a word with a given name in the dictionary, you need to write something like the following, because " stores strings in the CONST memory area, wheras SEARCH-ALL expects the name of the word in the DATA memory area:

: ... " THIS-NAME" TRANSIENT 0 CODE-FIELD SEARCH-ALL ... ;

There's actually a second version of TRANSIENT for strings located at addresses of data type CFAR-ADDRESS:

: TRANSIENT ( CFAR-ADDRESS -> CHARACTER UNSIGNED
  -- CDATA -> 2ND 3RD )
  TUCK TRANS-BOTTOM SWAP MOVE TRANS-BOTTOM SWAP ;

Now let's get back to pictured numeric output. The variable

>TRANS ( -- DATA -> CDATA -> CHARACTER )

contains a pointer to the current position within the transient area. At the beginning of a pictured numeric output conversion, this pointer is initialised with TRANS-TOP. Subsequently, it grows character by character towards TRANS-BOTTOM until number conversion is finished. Since the longest pictured numeric output is a 32 characters long binary number plus sign character, there's normally no risk that the pointer runs out of the 34 bytes long transient area. Nevertheless, StrongForth's version of HOLD checks if >TRANS runs out of the transient area:

: HOLD ( CHARACTER -- )
  >TRANS @ TRANS-BOTTOM >
  IF -1 >TRANS +! >TRANS @ !
  ELSE DROP -17 THROW
  THEN ;

An interesting detail is the usage of DROP to remove the CHARACTER item immediately after ELSE. Isn't THROW supposed to clear the stack contents anyway? Sure, but From the compiler's point of view, THROW is just an ordinary word that consumes an item of data type SIGNED. If DROP were not present, the compiler would complain that the two branches of the conditional clause do not have the same stack effect, which prevents joining the two branches after THEN.

The Number-Conversion Radix

Before we get engaged with pictured numeric output itself, let's have a quick glance at an important system variable. As specified in ANS Forth, the number-conversion radix is kept in the variable BASE:

10 VARIABLE BASE

Note that BASE is an unsigned number:

BASE ( -- DATA -> UNSIGNED )

The two words DECIMAL and HEX, which directly set the number-conversion radix, are defined as expected:

: DECIMAL ( -- )
  10 BASE ! ;

: HEX ( -- )
  16 BASE ! ;

Painting A Picture

Pictured numeric output conversion always starts with <# and ends with #>. In StrongForth, <# expects any item of data type DOUBLE on the stack, while #> leaves a character string, consisting of an address and a character count.

<# ( DOUBLE -- NUMBER-DOUBLE )
#> ( NUMBER-DOUBLE -- CDATA -> CHARACTER UNSIGNED )

During the conversion, a special subtype of UNSIGNED-DOUBLE, called NUMBER-DOUBLE, remains on the stack. Items of data type NUMBER-DOUBLE can only be produced by <# and can only be consumed by #>, which ensures <# and #> are always used in pairs. It is a common technique in StrongForth to introduce a special data type with the sole purpose of forcing the programmer to stick to a given syntax. Any violation of the syntax rules requires using a type cast. ANS Forth, on the other hand, allows using any word in any place as long as the stack does not run empty. Thus, syntax violations are normally not detected during compilation. They usually lead to runtime errors or crashes.

Now, here are the definitions of <# and #>:

: <# ( DOUBLE -- NUMBER-DOUBLE )
  TRANS-TOP >TRANS ! CAST NUMBER-DOUBLE ;

: #> ( NUMBER-DOUBLE -- CDATA -> CHARACTER UNSIGNED )
  DROP >TRANS @ TRANS-TOP OVER - CAST UNSIGNED ;

Apart from the type casts, these definitions do not differ from their respective ANS Forth equivalents. The type cast at the end of the definition of #> is required because -, when applied to two items of an address, leaves an item of data type INTEGER on the stack. Since it's certain that TRANS-TOP is not less than the content of >TRANS, the difference may safely be casted to an unsigned number.

The next step is to convert the double number of data type NUMBER-DOUBLE into a sequence of digits. Like in ANS Forth, this is done by # and #S:

: # ( NUMBER-DOUBLE -- 1ST )
  BASE @ /MOD OVER 10 <
  IF [CHAR] 0 ELSE [ CHAR A 10 - ] LITERAL THEN ROT + HOLD ;

: #S ( NUMBER-DOUBLE -- 1ST )
  BEGIN # DUP 0= UNTIL ;

Remember that NUMBER-DOUBLE is a subtype of UNSIGNED-DOUBLE, which means that all words expecting an item of data type UNSIGNED-DOUBLE also accept an item of data type NUMBER-DOUBLE.

ANS Forth provides three words for displaying signed single-precision numbers, unsigned single-precision numbers and signed doubleprecision numbers in free field format:

. ( n -- )
U. ( u -- )
D. ( d -- )

StrongForth overloads . to provide all of the above words plus one additional word for unsigned double-precision numbers:

. ( SIGNED -- )
. ( SINGLE -- )
. ( SIGNED-DOUBLE -- )
. ( DOUBLE -- )

The most trivial one is the version for unsigned double-precision numbers. Since it's the first one in the dictionary, it will be found after all other overloaded versions of . have been checked. As a consequence, it matches all double-cell items that are not caught by a specialised version. Here's the definition:

: . ( DOUBLE -- )
  <# #S #> TYPE SPACE ;

. for unsigned single-precision numbers is directly derived from this version. Again, this word catches all single-cell items that are not handled by other overloaded versions. Note that the conversion from a single number to a double number cannot be accomplished by just padding a single-cell zero, because this would result in two single numbers instead of one double number on the stack. But S>D does the required job:

: . ( SINGLE -- )
  S>D . ;

Displaying signed numbers is a little bit more complicated. Since . for signed single numbers is derived from . for signed double numbers, let's start with the double number version:

: . ( SIGNED-DOUBLE -- )
  DUP 0< SWAP ABS <# #S SWAP SIGN #> TYPE SPACE ;

A small but important difference to the usual ANS Forth definition of D. is the usage of SIGN. The ANS Forth version of SIGN expects a signed single-precision number on the stack and stores a minus sign in the transient area if this number is negative. The signed single-precision number is usually identical to the most significant part of the double-precision number to be displayed. But playing tricks like that is definitely discouraged by the philosophy of StrongForth. In StrongForth, SIGN expects an item of data type FLAG on the stack. It stores a minus sign in the transient area if and only if the flag is TRUE:

: SIGN ( FLAG -- )
  IF [CHAR] - HOLD THEN ;

In the definition of . for signed double numbers, this flag is calculated in a clean and direct way by 0<. Remember that 0< is overloaded for single-precision and double-precision numbers. If you still want an overloaded version of SIGN that takes a signed single-precision number, don't hesitate to define it:

: SIGN ( SIGNED -- )
  0< SIGN ;

Similar to unsigned numbers, the definition of . for signed single-precision numbers is derived from the version for signed double-precision numbers:

: . ( SIGNED -- )
  S>D . ;

As mentioned before, the versions of . for items of data types SINGLE and DOUBLE catch all data types that are not handled by specialised, overloaded versions. To make sure they are not found before all other versions of . have gotten a chance to get matched by the interpreter or compiler, they have to be defined first. For example, . for items of data type CHARACTER must be defined after . for items of data type SINGLE. Another overloaded version of . accepts only items of data type FLAG and it's subtypes:

: . ( FLAG -- )
  IF ." TRUE " ELSE ." FALSE " THEN ;

Here are some examples that demonstrate overloading of .:

716 30 + .
746  OK
CHAR E 3 - .
B OK
-400000. +100000. - .
-500000  OK
6 8 > .
FALSE  OK
BASE .S .
DATA -> UNSIGNED 396  OK

Since there is no specialised version of . for addresses, applying . to addresses like BASE defaults to printing an unsigned single-precision number. It's up to you to define overloaded versions of . for different kinds of addresses that fit your requirements.

Finally, let's have a quick glance at the four overloaded versions of .R. These definitions look very similar to the definitions of . for single and double numbers. The versions for items of data types SINGLE and DOUBLE are defined before those for signed numbers, because they serve as catch-all.

: .R ( DOUBLE INTEGER -- )
  CAST SIGNED SWAP <# #S #>
  ROT OVER - SPACES TYPE ;

: .R ( SIGNED-DOUBLE INTEGER -- )
  CAST SIGNED SWAP DUP 0< SWAP ABS <# #S SWAP SIGN #>
  ROT OVER - SPACES TYPE ;

: .R ( SINGLE INTEGER -- )
  SWAP S>D SWAP .R ;

: .R ( SIGNED INTEGER -- )
  SWAP S>D SWAP .R ;

Note that the second paraemter of all four versions of .R has data type INTEGER, although it is interpreted as a signed number. If it were declared as SIGNED in the first place, the type cast at the beginning of the definition could have been saved. But being an INTEGER, you can provide an unsigned number as the field width as well, which is actually the typical use:

536 7138 M* 10 .R \ instead of ... +10 .R
   3825968 OK

Dr. Stephan Becher - October 8th, 2007