Being a DOS application, StrongForth takes advantage from some services provided by the operating system. For example, the default block file forth.blk is being accessed by DOS system functions like open, close, read and write. The standard input device and the standard output device are also provided by DOS. And even BYE uses a DOS system function to terminate StrongForth and return to the DOS command line.
StrongForth's implementations of the ANS Forth File-Access word set is also based on the interface to DOS. Their combined source code is provided in blocks 900 to 924. All words except ( are included. ( with the semantics as specified by the ANS Forth standard is not available in StrongForth, because the word with this name is instead used for stack diagrams. Comments that range over multiple lines can be inserted by either preceding each line with a \, or by enclosing the comment with FALSE [IF] ... [THEN].
Since those words that constitute the low-level interface to DOS are implemented as machine code definitions, it is required to load the StrongForth assembler before loading the File-Access word set:
100 129 THRU \ ASSEMBLER OK 900 924 THRU \ FILE-ACCESS WORD SET OK
ANS Forth specifies an extension of the semantics of S" to make this word available in interpretation state. The rationale of this specification is to be able to enter file names in interpretation state, because many words belonging to the File-Access word set expect strings containing file names on the stack. As a temporary storage for those strings, a buffer in the DATA memory space is required. StrongForth provides only one string buffer:
DATA-SPACE HERE CAST CDATA -> CHARACTER CONSTANT STR 80 ALLOT 0 VARIABLE #STR
The string buffer STR is 80 character long. #STR is a variable containing the actual length of the string currently stored in the buffer. The StrongForth word ", which is the same as S" in ANS Forth. is replaced by a version that allows compilation and interpretation:
: " ( -- ) [CHAR] " PARSE STATE @ IF POSTPONE SLITERAL ELSE #STR ! STR #STR @ MOVE " STR #STR @" EVALUATE THEN ; IMMEDIATE
An interesting detail of this definition is the fact that " uses EVALUATE instead of directly compiling STR #STR @. This is necessary because " is a state-smart word. Its stack diagram is empty, because its compilation sematic doesn't have a stack effect. Interpreting STR #STR @ with EVALUATE simply yields the desired stack effect in interpretation state.
The string buffer is used for a second purpose as well. The DOS functions that constitute file handling expect file names as null-terminated strings, like this:
'T' |
'E' |
'S' |
'T' |
'.' |
'F' |
'T' |
'H' |
0 |
: >STR ( CDATA -> CHARACTER UNSIGNED CDATA -> CHARACTER -- 4 TH ) LOCALS| A N | A N MOVE NULL CHARACTER A N + ! A ;
The last input parameter of >STR is the address of the buffer. It is returned unchanged as the output parameter. Here's an example that uses PAD as the buffer:
PAD 16 BLANK OK PAD 16 DUMP 0A66: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 OK " TEST.FTH" PAD >STR 16 DUMP 0A66: 54 45 53 54 2E 46 54 48 00 20 20 20 20 20 20 20 OK
StrongForth provides an overloaded version of >STR that uses the string buffer as the default. Two additional overloaded versions are provided for cases where the original string is stored in the CONST memory area instead of the DATA memory area:
: >STR ( CDATA -> CHARACTER UNSIGNED -- CDATA -> CHARACTER ) STR >STR ; : >STR ( CCONST -> CHARACTER UNSIGNED CDATA -> CHARACTER -- 4 TH ) LOCALS| A N | A N MOVE NULL CHARACTER A N + ! A ; : >STR ( CCONST -> CHARACTER UNSIGNED -- CDATA -> CHARACTER ) STR >STR ;
DOS grants applications access to its file system through a system interrupt with a number of subfunctions. Input and output parameters are passed in the registers of the processor. The input parameters are typically file names as null-terminated strings, file handles and character buffers. Each subfunction returns an error code if the associated operation fails.
StrongForth provides a low-level word for each subfunction of the system interrupt that is needed to implement the higher-level functions of the File-Access word set. Let's start with the subfunction that creates a new file:
(CREATE) ( CDATA -> CHARACTER LOGICAL -- FILE SIGNED )
(CREATE) is nothing else but a word that executes DOS system interrupt 33 with subfunction 60. CDATA -> CHARACTER is the address of a null-terminated character string containing the desired file name, and LOGICAL is a bit field that specifies file attributes according to the following table:
Bits 15 to 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 | Attribute |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | Unlimited access |
0 | x | x | x | x | x | 1 | Read only file |
0 | x | x | x | x | 1 | x | Hidden file |
0 | x | x | x | 1 | x | x | System file |
0 | 0 | 0 | 1 | 0 | 0 | 0 | Volume ID |
0 | x | 1 | x | x | x | x | Subdirectory |
0 | 1 | x | x | x | x | x | File has not been archived |
x means don't care. Bits 15 to 6 are supposed to be zero. All other bits except for bit 3 can be arbitrarily combined. For example, if LOGICAL is (binary) 0000000000000101, (CREATE) creates a read-only system file. A value of (binary) 0000000000001000, indicates the volume ID, which cannot be given attributes like read-only, hidden, system and so on. If bit 4 is set, a new subdirectory is being created instead of a new file.
The first output parameter of (CREATE) is the file handle of the newly created file or subdirectory. For file handles, a new data type has been created as a direct subtype of SINGLE:
DT SINGLE PROCREATES FILE
The file handle can be used by the other file functions to access the new file. The second output parameter of data type SIGNED is the DOS error code. If the new file was successfully created, SIGNED is zero. A non-zero value indicates that something went wrong, e. g., because an invalid directory path was specified, or too many files are open. The specific error codes a DOS function can return are specified in the documentation of DOS.
The second low-level DOS function we'll investigate is (OPEN), which opens an already existing file:
(OPEN) ( CDATA -> CHARACTER FAM -- FILE SIGNED )
Just like with (CREATE), the file name is passed to (OPEN) as a null-terminated character string of data type CDATA -> CHARACTER. But instead of file attributes, (OPEN) expects a file access method FAM, which indicates whether read access, write access or both are allowed. StrongForth provides another new data type for the file access method, plus a small collection of constants:
DT LOGICAL PROCREATES FAM 0 CAST FAM CONSTANT R/O \ read only 1 CAST FAM CONSTANT W/O \ write only 2 CAST FAM CONSTANT R/W \ read/write : BIN ( FAM -- 1ST ) 8 BIT OR ;
The three constants define the three possible file access methods, which either allow unrestricted access (R/W) or access in one direction only (R/O or W/O). For example, if a file is opened with the R/O file access method, any attempt to write data to the file will be prohibited. BIN is the StrongForth implementation of the ANS Forth word BIN. However, the binary attribute is ignored by the DOS file system.
The output parameters of (OPEN) are the same as the output parameters. of (CREATE).
The other low-level DOS functions are just briefly described here. Details can be obtained from the glossary of the StrongForth File-Access word set.
(CLOSE) ( FILE -- SIGNED )
Closes the file with the file handle FILE.
(POSITION) ( FILE INTEGER-DOUBLE UNSIGNED -- UNSIGNED-DOUBLE SIGNED )
Repositions the file whose file handle is FILE. The new position is the sum of INTEGER-DOUBLE and either the beginning of the file, its current position, or its end, depending on the value of UNSIGNED:
UNSIGNED | New Position (= UNSIGNED-DOUBLE) |
---|---|
0 | INTEGER-DOUBLE |
1 | Current Position ± INTEGER-DOUBLE |
2 | Size of File ± INTEGER-DOUBLE |
(FLUSH) ( FILE -- SIGNED )
Flush buffered information to mass storage for the file whose file handle is FILE.
(DELETE) ( CDATA -> CHARACTER -- SIGNED )
Deletes the file whose name matches the null-terminated string CDATA -> CHARACTER.
(ATTRIBUTES) ( CDATA -> CHARACTER LOGICAL UNSIGNED -- 3RD SIGNED )
Queries or changes the attributes LOGICAL of the file whose name matches the null-terminated string CDATA -> CHARACTER. The value of UNSIGNED determines whether the operation is a query (0) or a change (1).
(RENAME) ( CDATA -> CHARACTER CDATA -> CHARACTER -- SIGNED )
Renames the file whose name matches the first null-terminated character string CDATA -> CHARACTER to the name specified by the second null-terminated character string CDATA -> CHARACTER.
(READ) ( CDATA -> CHARACTER UNSIGNED FILE -- 3RD SIGNED )
Reads up to UNSIGNED consecutive characters to the buffer at address CDATA -> CHARACTER from the current position of the file whose file handle is FILE. 3RD is the number of characters actually read.
(WRITE) ( CCONST -> CHARACTER UNSIGNED FILE -- SIGNED ) (WRITE) ( CDATA -> CHARACTER UNSIGNED FILE -- SIGNED )
Writes UNSIGNED consecutive characters stored at address CDATA -> CHARACTER or CCONST -> CHARACTER to the current position of the file whose file handle is FILE.
Most high-level file access words as specified by ANS Forth can more or less directly be derived from the low-level DOS file access words. The main differences are file names and error codes. ANS Forth (and StrongForth) file names are provided in the address-and-count format, while the low-level DOS file access words expect null-terminated strings as file names. The conversion is usually accomplished be executing >STR. DOS error codes are in the range 0 to 88. In order to allow ANS Forth I/O result codes to be directly thrown as exceptions, it is necessary to do a transformation. >IOR maps a DOS error code into the range expected by THROW, which are negative numbers not overlapping with already defined error codes:
: >IOR ( SIGNED -- 1ST ) DUP IF NEGATE 300 - THEN ;
An I/O result code returned by >IOR is either 0 or between -388 and -301, provided >IOR is supplied with a DOS error code.
Now let's turn to the high-level file access words as specified by ANS Forth. Since StrongForth is able to distinguish words with the same name and different stack diagrams, prefixes "FILE-" and suffixes "-FILE" were stripped from the names of all words. CREATE-FILE simply becomes CREATE, OPEN-FILE becomes OPEN, FILE-POSITION becomes POSITION, and so on. Let's begin with CREATE:
: CREATE ( CDATA -> CHARACTER FAM -- FILE SIGNED ) LOCALS| F S | S NULL LOGICAL (CREATE) DUP 0= F R/W <> AND IF DROP DUP (CLOSE) DUP 0= IF DROP DROP S F (OPEN) THEN THEN >IOR ; : CREATE ( CDATA -> CHARACTER UNSIGNED FAM -- FILE SIGNED ) >R >STR R> CREATE ; : CREATE ( CCONST -> CHARACTER UNSIGNED FAM -- FILE SIGNED ) >R >STR R> CREATE ;
The first overloaded version expects the file name as a counted string. It is used by the other two versions, which can be applied to file names in the usual address-and-count format in either the DATA or the CONST memory area. (CREATE) has to be provided with an item of data type LOGICAL specifying the file attributes. If all bits of LOGICAL are zero, (CREATE) creates a new file for unlimited access. The file is being opened for read/write access by default. This means, the file has to be closed and reopened if a different file access method is desired. If one of the low-level DOS functions ((CREATE), (CLOSE) and (OPEN)) returns a non-zero DOS error code, the complete operation fails. At the end of the definition, the DOS error code is converted into an I/O result code.
The ANS Forth words OPEN-FILE and DELETE-FILE are in StrongForth called OPEN and DELETE, respectively. Their definitions are quite simple, because the semantics are almost identical to those of the corresponding low-level DOS functions. Since OPEN and DELETE expect a character string containing a file name, each of them has two overloaded versions handling strings in different memory areas. Interfacing to the low-level DOS functions just requires converting file names of null-terminated strings, and converting DOS error codes to I/O result codes:
: OPEN ( CDATA -> CHARACTER UNSIGNED FAM -- FILE SIGNED ) >R >STR R> (OPEN) >IOR ; : OPEN ( CCONST -> CHARACTER UNSIGNED FAM -- FILE SIGNED ) >R >STR R> (OPEN) >IOR ; : DELETE ( CDATA -> CHARACTER UNSIGNED -- SIGNED ) >STR (DELETE) >IOR ; : DELETE ( CCONST -> CHARACTER UNSIGNED -- SIGNED ) >STR (DELETE) >IOR ;
The definitions of CLOSE, READ, WRITE and FLUSH, which correspond to the ANS Forth words CLOSE-FILE, READ-FILE, WRITE-FILE and FLUSH-FILE, are even more simple, because these words do not expect a file name. WRITE is overloaded in order to allow writing strings both from the DATA and CONST memory area:
: CLOSE ( FILE -- SIGNED ) (CLOSE) >IOR ; : READ ( CDATA -> CHARACTER UNSIGNED FILE -- 3RD SIGNED ) (READ) >IOR ; : WRITE ( CDATA -> CHARACTER UNSIGNED FILE -- SIGNED ) (WRITE) >IOR ; : WRITE ( CCONST -> CHARACTER UNSIGNED FILE -- SIGNED ) (WRITE) >IOR ; : FLUSH ( FILE -- SIGNED ) (FLUSH) >IOR ;
The interpreter and the compiler can distinguish FLUSH from the already existing word with the same name from the Block word set, because it has an input parameter of data type FILE.
The semantics of the ANS Forth word FILE-STATUS is implemented in two overloaded versions of the StrongForth word STATUS. STATUS executes the low-level DOS function (ATTRIBUTES) with the parameters NULL LOGICAL 0 to query the file attributes:
: STATUS ( CDATA -> CHARACTER UNSIGNED -- LOGICAL SIGNED ) >STR NULL LOGICAL 0 (ATTRIBUTES) >IOR ; : STATUS ( CCONST -> CHARACTER UNSIGNED -- LOGICAL SIGNED ) >STR NULL LOGICAL 0 (ATTRIBUTES) >IOR ;
The current position of a file can be queried using (POSITION) by repositioning the file 0 characters relative to the current file position. With these parameters, (POSITION) returns the current file position without changing it:
: POSITION ( FILE -- UNSIGNED-DOUBLE SIGNED ) 0. 1 (POSITION) >IOR ;
The same low-level DOS function is used to implement the word REPOSITION, which has the same semantics as the ANS Forth word REPOSITION-FILE. Since the desired position is absolute, subfunction 0 of (POSITION) has to be used:
: REPOSITION ( UNSIGNED-DOUBLE FILE -- SIGNED ) SWAP 0 (POSITION) NIP >IOR ;
Determining the size of a file with SIZE is not as trivial. (POSITION) returns the desired value when repositioning the file 0 characters relative to the end of the file, but this also means that the previous position is changed as a side effect. Therefore, it is necessary to save and restore the current file position before and after determining the size, respectively. What makes the definition of SIZE look complex is the fact that it contains two additional exit points that are taken if one of the low-level DOS functions returns a non-zero error code:
: SIZE ( FILE -- UNSIGNED-DOUBLE SIGNED ) DUP POSITION DUP IF ROT DROP EXIT THEN DROP OVER 0. 2 (POSITION) >IOR DUP IF ROT DROP ROT DROP EXIT THEN DROP SWAP ROT REPOSITION ;
The implementation of RESIZE takes advantage of an undocumented feature of the DOS function encapsulated in (WRITE). Writing a zero-length string to a file truncates the file at the current file position. Without this feature, it would not be possible to reduce a file's size. Executing REPOSITION just changes the file position, increasing the file's size if required, but never decreasing it:
: RESIZE ( UNSIGNED-DOUBLE FILE -- SIGNED ) TUCK REPOSITION DUP IF NIP EXIT THEN DROP NULL CDATA -> CHARACTER 0 ROT WRITE ;
StrongForth provides four overloaded versions of RENAME in order to cover all combinations of the two file names being stored in the DATA and CONST memory areas. The fact that both file names have to be converted from the address-and-count format to null-terminated strings causes a minor complication, because the two null-termianted strings share the same string buffer. All four versions of RENAME first create a null-terminated string for the destination file name at the end of the string that currently occupies the string buffer, without overwriting it. Then, the source file name is converted to a null-terminated string located at the end of the first null-terminated string. This is how the string buffer might look like immediately before executing (RENAME):
'C' |
'u' |
'r' |
'r' |
'e' |
'n' |
't' |
'D' |
'E' |
'S' |
'T' |
'.' |
'E' |
'X' |
'T' |
0 |
'S' |
'R' |
'C' |
'.' |
'E' |
'X' |
'T' |
0 |
Now, here are the definitions of RENAME:
: RENAME ( CDATA -> CHARACTER UNSIGNED CDATA -> CHARACTER UNSIGNED -- SIGNED ) TUCK STR #STR @ + >STR >R R@ SWAP + 1+ >STR R> (RENAME) >IOR ; : RENAME ( CDATA -> CHARACTER UNSIGNED CCONST -> CHARACTER UNSIGNED -- SIGNED ) TUCK STR #STR @ + >STR >R R@ SWAP + 1+ >STR R> (RENAME) >IOR ; : RENAME ( CCONST -> CHARACTER UNSIGNED CDATA -> CHARACTER UNSIGNED -- SIGNED ) TUCK STR #STR @ + >STR >R R@ SWAP + 1+ >STR R> (RENAME) >IOR ; : RENAME ( CCONST -> CHARACTER UNSIGNED CCONST -> CHARACTER UNSIGNED -- SIGNED ) TUCK STR #STR @ + >STR >R R@ SWAP + 1+ >STR R> (RENAME) >IOR ;
DOS uses a sequence of two control characters to terminate a line of text: carriage return (<CR>) and line feed (<LF>). WRITE-EOL uses WRITE to write those two characters to a file. A 16-bit cell on the data stack serves as the character buffer:
: WRITE-EOL ( FILE -- SIGNED ) >R 0A0D SP@ CAST CDATA -> CHARACTER 2 R> WRITE NIP ;
With WRITE-EOL, the ANS Forth word WRITE-LINE can easily be implemented. StrongForth provides two overloaded versions of WRITE-LINE in order to allow writing strings located in the DATA memory area as well as in the CONST memory area:
: WRITE-LINE ( CDATA -> CHARACTER UNSIGNED FILE -- SIGNED ) DUP LOCALS| F | WRITE DUP 0= IF DROP F WRITE-EOL THEN ; : WRITE-LINE ( CCONST -> CHARACTER UNSIGNED FILE -- SIGNED ) DUP LOCALS| F | WRITE DUP 0= IF DROP F WRITE-EOL THEN ;
The implementation of READ-LINE is more complicated. READ-LINE starts trying to read as many characters as fit into the buffer specified by CDATA -> CHARACTER UNSIGNED, plus two characters for the line terminator sequence. Then, it searches the buffer for the first occurence of a carriage return character, which is the first character of the DOS line termination sequence. If a carriage return character is found, a complete line has been read. In this case, the file is repositioned back to the beginning of the next line, which starts at the position of the carriage return character plus 2 characters. If the buffer does not contain a carriage return character, it means that the line read so far is not complete and the file position remains unchanged. FLAG is TRUE if and only if at least one character has been successfully read from the file. This is the definition of READ-LINE:
: READ-LINE ( CDATA -> CHARACTER UNSIGNED FILE -- 3RD FLAG SIGNED ) LOCALS| F N A | A N 2 + F READ DUP IF FALSE SWAP EXIT THEN DROP DUP IF A OVER + A DO I @ 0D CAST CHARACTER = IF I A - DUP CAST UNSIGNED SWAP ROT - 2 + F SWAP CAST INTEGER-DOUBLE 1 (POSITION) NIP >IOR TRUE SWAP EXIT THEN LOOP TRUE ELSE FALSE THEN +0 ;
Up to now, input sources have been either the user input device, strings, or blocks. Now we have to add text files as another source of input to the interpreter and the compiler. In order to interpret the contents of a text file, ANS Forth specifies the words INCLUDED and INCLUDE-FILE. In StrongForth, both words share the same name INCLUDE:
: INCLUDE ( FILE -- ) SOURCE-SPEC >IN @ BLK @ SOURCE-ID LOCALS| S B I X | TO SOURCE-ID 0 BLK ! 0 >IN ! BEGIN REFILL WHILE INTERPRET REPEAT SOURCE-ID CLOSE THROW S TO SOURCE-ID B BLK ! I >IN ! X TO SOURCE-SPEC ?REFILL ; : INCLUDE ( CDATA -> CHARACTER UNSIGNED -- ) R/O OPEN THROW INCLUDE ;
The first version can be applied to a text file that has already been opened, while the second one additionally opens the file with a given name. INCLUDE saves the current input source specification and then switches to the text file as the new input source by storing the file handle to SOURCE-ID, and zero to BLK and >IN. Then it enters an interpreter loop, repeatedly refilling and interpreting the input buffer. At the end or the file, INCLUDE closes the file and restores the previous input source specification.
Obviously, the details are hidden in REFILL:
:NONAME ( -- FLAG ) BLK @ IF BLK @ #BLOCKS < DUP IF 1 BLK +! 0 >IN ! THEN ELSE SOURCE-ID STRING-ID <> DUP IF DROP SOURCE-ID IF SOURCE-ID DUP POSITION THROW CAST DOUBLE TO SOURCE-SPEC FIB 1022 ROT READ-LINE THROW SWAP #FIB ! ELSE TIB 80 ACCEPT #TIB ! TRUE THEN DUP IF 0 >IN ! THEN THEN THEN ; IS REFILL
DOS file handles are always greater than zero. If SOURCE-ID contains a file handle, REFILL saves the current file position in SOURCE-SPEC and reads the next line of text from the file into the file input buffer FIB. The file input buffer is 1024 characters long, which is enough for a line of text with up to 1022 characters plus two characters for the DOS line terminator. The length of the line is sorted in a variable called #FIB:
HERE CAST CDATA -> CHARACTER CONSTANT FIB 1024 ALLOT 0 VARIABLE #FIB
SOURCE has to extended as well, because INTERPRET parses the input stream with PARSE-WORD, and PARSE-WORD in turn uses SOURCE to parse the next word. If the input source is a file, SOURCE returns the file input buffer as a character string:
:NONAME ( -- CDATA -> CHARACTER UNSIGNED ) BLK @ IF BLK @ BLOCK C/B ELSE SOURCE-ID STRING-ID = IF SOURCE-SPEC SPLIT ( SINGLE SINGLE -- CDATA -> CHARACTER UNSIGNED )CAST ELSE SOURCE-ID IF FIB #FIB @ ELSE TIB #TIB @ THEN THEN THEN ; IS SOURCE
Including text files, we have now four different input sources:
BLK | SOURCE-ID | Input Source |
---|---|---|
0 | 0 | user input device |
0 | -1 | string |
n = 1 ... #BLOCKS | don't care | block n |
0 | f > 0 | text file with file handle f |
But what about the mysterious word ?REFILL? ?REFILL is a deferred definition that is executed by EVALUATE, THROW, INCLUDE and RESTORE-INPUT. Now it's time to assign it specific semantics. The problem is that INCLUDE may be nested. If an included file contains itself another INCLUDE, interpreting the nested file overwrites the file input buffer. Once interpretation of the nested file is done, the line of text from the first file is no longer present in the file input buffer and needs to be re-read. That's why REFILL saves the position of a text file before reading the next line of text. After INCLUDE has restored the input source specification, ?REFILL restores the original line of text if the previous input source was a file:
:NONAME ( -- ) SOURCE-ID 0<> SOURCE-ID STRING-ID <> AND IF SOURCE-SPEC CAST UNSIGNED-DOUBLE SOURCE-ID REPOSITION THROW >IN @ REFILL INVERT IF -37 THROW THEN >IN ! THEN ; IS ?REFILL
?REFILL has to be included in EVALUATE, THROW and RESTORE-INPUT as well, because these words also restore the input source specification. Consider a case where an included file contains EVALUATE, and the evaluated string in turn contains another INCLUDE. The evaluated INCLUDE is not able to restore the line of text from the first text file, because EVALUATE overwrote its file handle and its file position by assigning different values to SOURCE-ID and SOURCE-SPEC. But when EVALUATE itself terminates, it restores the input specification of the first text file. Now it is possible to re-read the original line of text with ?REFILL. A similar situation can happen if a nested file throws an exception, because the control flow won't return to the second INCLUDE. Remember that the input source specification is contained in the exception frame.
Finally, SAVE-INPUT and RESTORE-INPUT have to be extended in order to handle files as input source. A fourth format of the tuple INPUT-SOURCE needs to be defined:
User input device:
>IN |
BLK |
>IN |
SOURCE-SPEC |
>IN |
SOURCE-ID |
SOURCE-SPEC |
>IN |
The fourth format includes the file handle from SOURCE-ID and the file position of the beginning of the current line of text. The restore fails if the current file is not the same as the one at the time the input source specification was saved. The updated definitions of SAVE-INPUT and RESTORE-INPUT can handle all four formats:
: SAVE-INPUT ( -- INPUT-SOURCE ) NEW-TUPLE BLK @ IF -> UNSIGNED BLK @ >T CAST TUPLE ELSE SOURCE-ID IF SOURCE-ID STRING-ID <> IF -> FILE SOURCE-ID >T CAST TUPLE THEN -> DOUBLE SOURCE-SPEC >T CAST TUPLE THEN THEN -> UNSIGNED >IN @ >T CAST INPUT-SOURCE ; : RESTORE-INPUT ( INPUT-SOURCE -- FLAG ) CAST TUPLE -> UNSIGNED SIZE CASE 1 OF T> >IN ! DROP BLK @ 0<> SOURCE-ID 0<> OR ENDOF 2 OF T> >IN ! T> BLK ! DROP FALSE ENDOF 3 OF T> >IN ! CAST TUPLE -> DOUBLE T> SOURCE-SPEC <> BLK @ 0<> OR SOURCE-ID 0= OR >R DROP R> ENDOF 4 OF T> >IN ! CAST TUPLE -> DOUBLE T> TO SOURCE-SPEC CAST TUPLE -> FILE T> SOURCE-ID <> BLK @ 0<> OR >R DROP R> ?REFILL ENDOF >R DROP TRUE R> ENDCASE ;
At the end of this chapter, let's now have a look at an alternative implementation of the Memory-Allocation word set, which is based on low-level DOS system functions. The three required DOS functions can be called with the following words:
(ALLOCATE) ( UNSIGNED -- FAR-ADDRESS 1ST SIGNED ) (DEALLOCATE) ( FAR-ADDRESS -- SIGNED ) (REALLOCATE) ( FAR-ADDRESS UNSIGNED -- 2ND SIGNED )
DOS allocates memory as 16 bytes long paragraphs in a new segment. (ALLOCATE) requests UNSIGNED paragraphs of memory, and returns the starting address of the allocated memory block as a full address FAR-ADDRESS. This means, the allocated memory does not occupy any space in the DATA memory area. As usual, SIGNED is the DOS error code. It is zero if the allocation was successful. If a memory block of the desired size is not available, DOS allocates the largest available memory block, returning the number of allocated paragraphs as 1ST and error code 8 as SIGNED.
(DEALLOCATE) returns an allocated memory block identified by its starting address to DOS. (REALLOCATE) shrinks or extends the size of an already allocated memory block. If an extension is not possible, (REALLOCATE) does not try to deallocate the existing one and allocate a new one with a different starting address. In this case, it just returns the number of available paragraphs and error code 8.
The definitions of the high-level words FAR-ALLOCATE, FREE and RESIZE are based on the low-level words (ALLOCATE), (DEALLOCATE) and (REALLOCATE), respectively. Since the low-level words deal with paragraphs instead of address units, >PARAGRAPHS converts address units to paragraphs:
: >PARAGRAPHS ( UNSIGNED -- 1ST ) 16 /MOD SWAP IF 1+ THEN ;
Together with >IOR and some stack juggling to adapt differing stack diagrams, the high-level memory allcation words can easily be defined. For convenience, additional versions are available that deal with addresses of data type CFAR-ADDRESS:
: FAR-ALLOCATE ( UNSIGNED -- FAR-ADDRESS SIGNED ) >PARAGRAPHS (ALLOCATE) NIP >IOR ; LATEST ALIAS CFAR-ALLOCATE ( UNSIGNED -- CFAR-ADDRESS SIGNED ) : FREE ( FAR-ADDRESS -- SIGNED ) DUP 0= IF DROP +0 ELSE (DEALLOCATE) >IOR THEN ; LATEST ALIAS FREE ( CFAR-ADDRESS -- SIGNED ) : RESIZE ( FAR-ADDRESS UNSIGNED -- 1ST SIGNED ) >PARAGRAPHS OVER 0= IF NIP (ALLOCATE) ELSE OVER SWAP (REALLOCATE) THEN NIP >IOR ; LATEST ALIAS RESIZE ( CFAR-ADDRESS UNSIGNED -- 1ST SIGNED )
The names FAR-ALLOCATE and CFAR-ALLOCATE were chosen in order to distinguish DOS based memory allocation from StrongForth's standard implementation of the Memory-Allocation word set. FAR-ALLOCATE and CFAR-ALLOCATE return the full segment-and-offset address of the allocated memory block, whereas ALLOCATE and CALLOCATE always return an address in the DATA memory area. FREE and RESIZE, on the other hand, can simply be overloaded, because the interpreter and the compiler can distingiush the different versions by the data types of their input parameters. Note that FREE and RESIZE accept null addresses. This feature simplifies some applications, because it generally saves an additional check for a null pointer.
Note also that RESIZE does not automatically allocate a new memory block, copy the contents of the old memory block, and then deallocate the old memory block, if the old memory block cannot be extended. This behaviour is considered by the ANS Forth standard, but it is not required.
Just as for memory allocation in the DATA memory area, StrongForth provides overloaded versions of SIZE for determining the size in address units of memory that has been allocated or resized by DOS functions. DOS keeps the size of the allocated memory block in paragraphs as an unsigned number in the third and fourth byte of the paragraph immediately preceeding the memory block. The first byte of this paragraph contains the ASCII value of either characters M or Z to indicate that the memory block has been allocated:
: SIZE ( FAR-ADDRESS -- UNSIGNED-DOUBLE ) SPLIT CAST UNSIGNED 1- MERGE DUP CAST CFAR-ADDRESS -> CHARACTER @ DUP [CHAR] M = SWAP [CHAR] Z = OR IF CAST FAR-ADDRESS 3 + -> UNSIGNED @ 16 M* ELSE DROP NULL UNSIGNED-DOUBLE THEN ; LATEST ALIAS SIZE ( CFAR-ADDRESS -- UNSIGNED-DOUBLE )
If an allocated memory block consists of 4096 paragraphs or more, its size in address units cannot be represented as a single-precision number. That's why SIZE returns a double-precision number. The value of this number is zero if the memory block is invalid. An alias definition of SIZE takes care of memory blocks that start at addresses of data type CFAR-ADDRESS.
These overloaded versions of SIZE do not interfere with those versions that expect parameters of data types DATA-TYPE, TUPLE, FILE, DATA and CDATA, because the data type system can cleanly distinguish them from each other. Note that the actual size of a memory block can be slightly larger than the allocated size, because the operating system allocates memory blocks in paragraphs of 16 bytes each:
200 FAR-ALLOCATE THROW OK DUP SIZE . 208 OK FREE THROW OK
So far you've seen a number of words that directly interface to low-level DOS functions. However, DOS provides some more functions that are not required to implement the File-Access and Memory-Access word sets. Some of them are included in the Facility word set. You can get access to even more DOS functions by loading the interface words included in blocks 923 to 933. Detailed descriptions of the interface words are included in the glossary. Here's a list of the additional interface words:
(CREATE-TEMPORARY) ( CDATA -> CHARACTER LOGICAL -- FILE SIGNED ) (CREATE-NEW) ( CDATA -> CHARACTER LOGICAL -- FILE SIGNED ) (ALLOCATION-STRATEGY) ( UNSIGNED UNSIGNED -- 1ST SIGNED ) (DUPLICATE) ( FILE -- 1ST SIGNED ) (FORCED-DUPLICATE) ( FILE FILE -- SIGNED ) (LOCK/UNLOCK) ( FILE UNSIGNED-DOUBLE UNSIGNED-DOUBLE UNSIGNED -- SIGNED ) (CONTROL-I/O) ( SINGLE UNSIGNED SINGLE UNSIGNED -- UNSIGNED LOGICAL SIGNED ) (LAST-WRITE) ( FILE DOUBLE UNSIGNED -- 2ND SIGNED ) (PSP) ( -- FAR-ADDRESS ) (SET-DRIVE) ( UNSIGNED -- 1ST ) (GET-DRIVE) ( -- UNSIGNED ) (CREATE-DIR) ( CDATA -> CHARACTER -- SIGNED ) (REMOVE-DIR) ( CDATA -> CHARACTER -- SIGNED ) (CHANGE-DIR) ( CDATA -> CHARACTER -- SIGNED ) (CURRENT-DIR) ( CDATA -> CHARACTER UNSIGNED -- SIGNED ) (SEARCH-DIR) ( CDATA -> CHARACTER LOGICAL -- SIGNED ) (SEARCH-DIR-NEXT) ( -- SIGNED ) (TSR) ( UNSIGNED UNSIGNED -- ) (LOAD/EXECUTE) ( CDATA -> CHARACTER FAR-ADDRESS UNSIGNED -- SIGNED ) (TERMINATE) ( UNSIGNED -- ) (RETURN-CODE) ( -- UNSIGNED UNSIGNED ) (GET-DRIVE-DATA) ( UNSIGNED -- FAR-ADDRESS UNSIGNED UNSIGNED UNSIGNED ) (GET-DISK-FREE-SPACE) ( UNSIGNED -- UNSIGNED UNSIGNED UNSIGNED UNSIGNED ) (SET-DTA) ( FAR-ADDRESS -- ) (GET-DTA) ( -- FAR-ADDRESS ) (SET-INTERRUPT) ( FAR-ADDRESS UNSIGNED -- ) (GET-INTERRUPT) ( UNSIGNED -- FAR-ADDRESS ) (SET-DATE) ( UNSIGNED UNSIGNED UNSIGNED -- FLAG ) (SET-TIME) ( UNSIGNED UNSIGNED UNSIGNED UNSIGNED -- FLAG ) (SET-VERIFY-FLAG) ( SINGLE -- ) (GET-VERIFY-FLAG) ( -- FLAG ) (CTRL-C-CHECK) ( SINGLE UNSIGNED -- FLAG FLAG ) (GET-DOS-VERSION) ( -- UNSIGNED-DOUBLE UNSIGNED UNSIGNED ) (COUNTRY-DEPENDENT-INFORMATION) ( CDATA UNSIGNED -- 2ND SIGNED )
Dr. Stephan Becher - February 6th, 2008