Chapter 17: Blocks

Block Files

StrongForth provides the complete Block and Block Extension word sets as specified by ANS Forth. All blocks are contained in a single file with the name forth.blk, which is located in the same directory as the executable forth.exe. However, you can chose to call StrongForth with a different block file, if you supply its path and filename as a command-line parameter. To generate a new block file, just copy forth.blk and rename it. Then start StrongForth with the new block file as command line parameter.

Block 2 of each block file has a special meaning. This block is automatically LOADed immediately after StrongForth has been started. It normally contains source code to display a startup message and to LOAD other blocks that contain the source code of libraries, additional word sets and useful utilities. Executing QUIT at the end of block 2 starts the interpreter loop. If QUIT is replaced with BYE, StrongForth simply exits after the interpretation of block 2. This feature is useful for running applications implemeted in StrongForth:

C:\>copy forth.blk app.blk
        1 file(s) copied.

C:\>forth app.blk
8086 StrongForth 1.0
2 CLEAR
 OK

\ ... edit block 2 ...

 OK
2 LIST
 0 \ Dummy application
 1
 2 800 830 THRU \ library
 3
 4 ." StrongForth application start" CR
 5
 6 \ ... LOAD ...
 7
 8 ." StrongForth application end" BYE
 9
10
11
12
13
14
15
 OK
BYE

C:\>forth app.blk
StrongForth application start
StrongForth application end
C:\>

Reading And Writing Blocks

Physical transfer of blocks to and from the block file is performed by the low-level word (BLOCK):

(BLOCK) ( CDATA -> CHARACTER UNSIGNED FLAG -- SIGNED )

(BLOCK) expects a buffer address CDATA -> CHARACTER in the DATA memory area, the number UNSIGNED of the block to be transferred, and a FLAG indicating the direction of the transfer. If FLAG is FALSE, (BLOCK) reads the block with number UNSIGNED from the block file and stores it starting at address CDATA -> CHARACTER. If FLAG is TRUE, (BLOCK) writes the contents of the buffer located at address CDATA -> CHARACTER to the block with number UNSIGNED of the block file. Note that the buffer address is a character address, which indicates that blocks are primarily used for storing characters. However, it is also possible to store any other data in blocks, e. g., arrays of numbers. In these cases, an explicit type cast is required. The output parameter SIGNED is the DOS error code, which is zero if something went wrong and non-zero otherwise. (BLOCK) is supposed to be a low-level word, because ANS Forth specifies some more convenient words that provide read and write access to blocks. You will probably never use it.

#BLOCKS is a value containing the size of the block file in blocks. This means, block numbers may be between 1 and #BLOCKS. But note that (BLOCKS) does not verify whether its parameter UNSIGNED is a valid block number. This check has to be done manually by executing ?BLOCK. ?BLOCK issues an error message if its parameter is either zero or greater than #BLOCKS:

1000 VALUE #BLOCKS

: ?BLOCK ( UNSIGNED -- )
  #BLOCKS 1+ 1 WITHIN IF -35 THROW THEN ;

Although (BLOCK) allows specifying any buffer address, the current version of StrongForth uses only one buffer for all transfers. BLK-BUFF is the (constant) address of this buffer:

BLK-BUFF ( -- CDATA -> CHARACTER )

The block buffer can be in either of the following states:

It is not assigned to any block.
It is assigned to a specific block. The contents of the buffer has not been modified with respect to the contents of the block in the block file.
It is assigned to a specific block, and its contents has been modified.

Variable BLK-STAT contains a number of data type SIGNED that indicates the state of the block buffer.

BLK-STAT ( -- DATA -> SIGNED )

As long as the block buffer is unassigned, BLK-STAT contains zero. Any positive value between 1 and #BLOCKS is the number of an unmodified block that has been assigned to the block buffer. A negative value indicates a modified block:

`BLK-STAT`	assigned to	modified
`n = 0`	none	n/a
`n > 0`	block `n`	no
`n < 0`	block `\|n\|`	yes

Once the block that has been assigned to the block buffer is modified, and the modification is intended to be permanant, you need to mark the block as modified by negating the contents of variable BLK-STAT. This can simply be done by executing UPDATE:

: UPDATE ( -- )
  BLK-STAT @ ABS NEGATE BLK-STAT ! ;

Note that UPDATE does not write the contents of the block buffer back to the block file. It just marks the block buffer as modified. The actual write operation is performed by SAVE-BUFFERS:

: SAVE-BUFFERS ( -- )
  BLK-STAT @ 0<
  IF BLK-BUFF BLK-STAT @ ABS DUP BLK-STAT !
     CAST UNSIGNED DUP ?BLOCK TRUE (BLOCK) IF -34 THROW THEN
  THEN ;

SAVE-BUFFERS first checks whether the block buffer is assigned to a block that has been modified. If this is true, it writes the block back to the block file and marks the block buffer as unmodified. Note that SAVE-BUFFERS executes ?BLOCK before (BLOCK) to make sure BLK-STAT contains a valid block number. If (BLOCK) fails for another reason, an exception is thrown.

EMPTY-BUFFERS, if executed before SAVE-BUFFERS, discards all modifications and unassigns the block buffer. FLUSH also unassigns the block buffer, but it previously saves the contents of the block buffer to the block file, if it has been modified. Just like UPDATE and SAVE-BUFFERS, these two words are part of the ANS Forth Block and Block Extension word sets:

: EMPTY-BUFFERS ( -- )
  +0 BLK-STAT ! ;

: FLUSH ( -- )
  SAVE-BUFFERS EMPTY-BUFFERS ;

But how does does the block buffer get assigned to a block? One possibility is to create a new block from scratch with BUFFER:

: BUFFER ( UNSIGNED -- CDATA -> CHARACTER )
  DUP BLK-STAT @ ABS CAST UNSIGNED =
  IF DROP
  ELSE SAVE-BUFFERS CAST SIGNED BLK-STAT !
  THEN BLK-BUFF ;

BUFFER is an ANS Forth word that expects a block number on the data stack and returns the address of a block buffer. It does not read an existing block from teh block file. If the block buffer is already assigned to a different block and has been modified, BUFFER saves the block buffer and thes reassigns it to the new block. Since StrongForth has only one block buffer, the output parameter of BUFFER is always the same.

The second possibility to assign the block buffer to a block is to access an already esisting block. the definition of BLOCK is actually very similar to the definition of BUFFER:

: BLOCK ( UNSIGNED -- CDATA -> CHARACTER )
  DUP BLK-STAT @ ABS CAST UNSIGNED =
  IF DROP
  ELSE SAVE-BUFFERS BLK-BUFF OVER DUP ?BLOCK
     FALSE (BLOCK) IF -33 THROW THEN CAST SIGNED BLK-STAT !
  THEN BLK-BUFF ;

The difference is that BLOCK performs a physical read operation after saving the old contents of the block buffer. Of course, this is only necessary if the new block is not the same as the old one. BLOCK throws an exception if an attempt is made to read an invalid block, or if the read operation fails for any other reason:

#BLOCKS .
1000  OK
1000 BLOCK DROP
 OK
0 BLOCK

0 BLOCK ? invalid block number
CDATA -> CHARACTER
1001 BLOCK

1001 BLOCK ? invalid block number
CDATA -> CHARACTER

Block Structure

According to the ANS Forth specification, a block is just 1024 characters of data on mass storage, designated by a block number. However, it is usually interpreted as being divided into 16 lines of 64 characters each. StrongForth defines two constants for the number of characters per block and per line:

1024 CONSTANT C/B
64 CONSTANT C/L

The contents of a block can be displayed with the ANS Forth word LIST. LIST stores the block number in variable SCR for later reference. An example of the usage of LIST can be seen in the first section of this chapter.

0 VARIABLE SCR

: LIST ( UNSIGNED -- )
  SCR ! BASE @ DECIMAL SCR @ BLOCK [ C/B C/L / ] LITERAL 0
  DO I 2 .R SPACE DUP I C/L * + C/L -TRAILING TYPE CR
  LOOP DROP BASE ! ;

Blocks As Input Source

A block can be made the input source by storing the block number in the system variable BLK. Together with >IN and SOURCE-ID, BLK is actually a part of the input source specification. This means that the semantics of SOURCE needs to be extended. The extended version additionally considers the content of BLK:

0 VARIABLE BLK

:NONAME ( -- CDATA -> CHARACTER UNSIGNED )
  BLK @
  IF BLK @ BLOCK C/B
  ELSE SOURCE-ID
     IF SOURCE-SPEC SPLIT
        ( SINGLE SINGLE -- CDATA -> CHARACTER UNSIGNED )CAST
     ELSE TIB #TIB @
     THEN ;
  THEN ; IS SOURCE

If BLK contains a block number, SOURCE returns a character string with the address of the block buffer and the number of characters in a block. Otherwise, i. e., if BLK is zero, the input source is determined in the same way as by the non-block version of SOURCE as described in chapter 11. The non-block version considers only the system variable SOURCE-ID in order to decide whether the user input device or a string is the input source. The following table provides a quick overview on all input sources:

`BLK`	`SOURCE-ID`	Input Source
`0`	`0`	user input device
`0`	`-1`	string
`n = 1 ... #BLOCKS`	don't care	block `n`

The introduction of blocks has also consequences for other StrongForth words that deal with the input source specification. Since QUIT is supposed to make the user input device the input source, its semantics is extended by initializing BLK:

: QUIT ( -- )
  0 BLK ! QUIT ;

REFILL needs to be modified as well. Refilling the input source with a block means that the succeeding block becomes the new input source. REFILL returns FALSE if the current block has no successor:

:NONAME ( -- FLAG )
  BLK @
  IF BLK @ #BLOCKS < DUP
     IF 1 BLK +! 0 >IN !
     THEN
  ELSE SOURCE-ID 0= DUP
     IF TIB 80 ACCEPT #TIB ! 0 >IN !
     THEN
  THEN ; IS REFILL

Whenever the current input source specification is saved and restored, the content of BLK has to be included. This rule applies to SAVE-INPUT, RESTORE-INPUT and EVALUATE. Consequently, the non-block versions of these words have to be either replaced by corresponding block versions, or their semantics has to be extended:

: SAVE-INPUT ( -- INPUT-SOURCE )
  NEW-TUPLE BLK @
  IF -> UNSIGNED BLK @ >T CAST TUPLE
  ELSE SOURCE-ID
     IF -> DOUBLE SOURCE-SPEC >T CAST TUPLE
     THEN
  THEN -> UNSIGNED >IN @ >T CAST INPUT-SOURCE ;

: RESTORE-INPUT ( INPUT-SOURCE -- FLAG )
  CAST TUPLE -> UNSIGNED SIZE
  CASE 1 OF T> >IN ! DROP BLK @ 0<> SOURCE-ID 0<> OR ENDOF
       2 OF T> >IN ! T> BLK ! DROP FALSE ENDOF
       3 OF T> >IN ! CAST TUPLE -> DOUBLE T> SOURCE-SPEC <> 
            BLK @ 0<> OR SOURCE-ID 0= OR >R DROP R> ENDOF
       >R DROP TRUE R> 
  ENDCASE ;

: EVALUATE ( CDATA -> CHARACTER UNSIGNED -- )
  BLK @ LOCALS| B | 0 BLK ! EVALUATE B BLK ! ;

The non-block version of SAVE-INPUT handles only the user input device and strings as input sources. The block version is extended to handle blocks as well. This requires a third format for the tuple INPUT-SOURCE:

User input device:

     >IN

Block:

BLK

     >IN

String:

SOURCE-SPEC

     >IN

If the input source is a block, SAVE-INPUT saves the contents of variables BLK and >IN. The resulting tuple of data type INPUT-SOURCE has a size of two cells, which differs from the size of the tuple created for the other two input sources, the user input device and strings. RESTORE-INPUT considers the new format as an additional case.

EVALUATE temporarily makes a string the input source. If the current input source is a block, EVALUATE has to store zero in BLK. At the end, the previous value of BLK is being restored.

If the input source is the user input device or a string, \ discards the text up to and including the next occurence of the character \, or the remainder of the parse area if it doesn't contain another backslash. If the input source is a block, the semantics of \ is slightly different. Only the remainder of the current line of text is considered. Here's the definition of the extended version of \:

: \ ( -- )
  BLK @
  IF >IN @ C/L / 1+ C/L * POSTPONE \ >IN @ MIN >IN !
  ELSE POSTPONE \
  THEN ; IMMEDIATE

Since StrongForth uses the ANS Forth word ( for a different purpose, the only means for entering comments over multiple lines is with FALSE [IF] ... [THEN]. If you don't like this technique, you have to insert a \ at the beginning of every comment line.

The final word that needs to be modified in order to support blocks as input source is ERROR (see chapter 16):

: ERROR ( SIGNED -- )
  DUP
  IF CR SOURCE DROP >IN @ DECIMAL
     BLK @ IF DUP DUP C/L MOD - /STRING THEN
     -TRAILING TYPE DUP -399 -0 WITHIN
     IF ."  ? " ABS CAST UNSIGNED C/L C/B */MOD 5 + BLOCK
        SWAP + C/L -TRAILING TYPE SPACE
     ELSE ."  ? ERROR " .
     THEN
     BLK @ IF BLK @ . >IN @ C/L / . THEN CR POSTPONE .S ABORT
  ELSE DROP
  THEN ;

Out of the three parts of the error message,

the contents of the input buffer upto and including the most recently parsed word,
the error message in a narrow sense, and
the contents of the data type heap,

the first two are affected. The phrase BLK @ IF DUP DUP C/L MOD - /STRING THEN ensures that only one line of the block is printed to indicate the error's location within the parse area. The error message in a narrow sense is much more elaborate. Instead of just printing "ERROR" and the error number, the block version of ERROR prints a descriptive message that is obtained from blocks 5 to 29. Each line of these blocks contains one error message. E. g., block 5 contains the error messages for error codes -3 to -15:

5 LIST
 0 \ StrongForth error messages
 1 
 2 
 3 stack overflow
 4 stack underflow
 5 return stack overflow
 6 return stack underflow
 7 do-loops nested too deeply during execution
 8 dictionary overflow
 9 invalid memory address
10 division by zero
11 result out of range
12 argument type mismatch
13 undefined word
14 interpreting a compile-only word
15 invalid FORGET
 OK

Remember that error codes -1 and -2 have a special meaning. Since block 29 is the last block containing error messages, -399 is the lowest error number for which an error message is available. For all error numbers outside the range from -399 to -1, the same error message as in the non-block version of ERROR is printed:

-1000 ERROR ? ERROR -1000

The phrase BLK @ IF BLK @ . >IN @ C/L / . THEN prints the block number and the line number within the block.

The ANS Forth word LOAD can be used to interpret the contents of a block. LOAD saves the input source specification as locals on the return stack, stores the block number in variable BLK, resets >IN and interprets. When done, it simply restores the previous input source specification. Here's the definition of StrongForth's version of LOAD:

: LOAD ( UNSIGNED -- )
  DUP ?BLOCK
  >IN @ BLK @ LOCALS| B I | BLK ! 0 >IN !
  INTERPRET B BLK ! I >IN ! ;

You may notice that LOAD checks the input parameter with ?BLOCK for being a valid block number. Since BLOCK already checks for valid block numbers, is this additional check really necessary? It is. If INTERPRET is executed with an invalid block number, BLOCK would throw an exception. As long as the Exception word set has not been loaded, ERROR handles the exception. ERROR tries displaying the source line that caused the exception, which it assumes to be in the block whose number is stored in BLK. The attempt to access this block will throw another exception that again ends up in ERROR and so on. Sooner or later, one of the stacks overflows and causes the system to crash. This can only be avoided by ensuring that BLK never contains an invalid block number.

THRU allows interpreting a sequence of blocks. It's definition contains nothing more than a check for valid input parameters and a loop around LOAD:

: THRU ( UNSIGNED 1ST -- )
  OVER OVER >
  IF DROP DROP -35 THROW
  ELSE 1+ SWAP DO I LOAD LOOP
  THEN ;

Dr. Stephan Becher - July 28th, 2007