A string is a sequence of characters with a well-defined length between 0 and 255. The characters are stored in consecutive character size memory locations. According to the ANS Forth specification, a character string is specified by a cell pair (c-addr u) representing its starting address and length in characters. Since StrongForth can store strings in different memory areas, the representation of a string can be one of the following:
CDATA -> CHARACTER UNSIGNED CCONST -> CHARACTER UNSIGNED CFAR-ADDRESS -> CHARACTER UNSIGNED
The CODE memory area is usually not used for storing strings. Among the predefined storage locations for strings in the DATA memory area is the scratchpad PAD:
PAD .S DROP CDATA -> CHARACTER OK
PAD is defined as follows:
DATA-SPACE HERE CAST CDATA -> CHARACTER CONSTANT PAD 84 CHARS ALLOT
But ANS Forth actually specifies a second kind of string storage. A so-called counted string is a sequence of characters in memory, which is preceded by a length character. The length character is a character size memory location that contains the length of the string as an unsigned number. Here's an example of the memory image of a counted string:
11 |
s |
t |
r |
o |
n |
g |
F |
o |
r |
t |
h |
A counted string in memory is identified by the address of its length character. ANS Forth even specifies a word that converts a counted string into the cell ( c-addr u ) representation of a character string:
: COUNT ( c-addr1 -- c-addr2 u ) DUP CHAR+ SWAP @ ;
Anyway, since ANS Forth explicitly discourages using counted strings, they are abandoned in StrongForth. The ANS Forth words C", COUNT and WORD do not exist in StrongForth. FIND has been replaced by SEARCH-ALL, which (among other differences) expects a string in the CDATA -> CHARACTER UNSIGNED representation (see chapter 8). This is not considered as being a deficiency, because strings in the CDATA -> CHARACTER UNSIGNED representation can easily replace counted strings, and the advantage of having only one kind of representation is pretty obvious.
A small group of string processing words were already presented in of chapter 2:
FILL ( CDATA -> SINGLE UNSIGNED 2ND -- ) ERASE ( CDATA -> SINGLE UNSIGNED -- ) MOVE ( CDATA -> SINGLE CDATA -> 2ND UNSIGNED -- ) MOVE ( CCONST -> SINGLE CDATA -> 2ND UNSIGNED -- )
Since character strings are stored in memory blocks, these four words can be applied to character strings as well. FILL initializes a string with any character:
PAD 5 CHAR A FILL OK PAD 5 TYPE AAAAA OK
ERASE is a specialized version of FILL that initializes a memory block with zero. For strings, it is far more common to get initialized with space characters. This is what BLANK does. BLANK can only be applied to strings, but not to memory blocks in general:
: BLANK ( CDATA -> CHARACTER UNSIGNED -- ) BL FILL ;
The two overloaded versions of MOVE replace the ANS Forth words CMOVE and CMOVE> for copying strings between different memory locations.
Now, let's continue with some more string processing words. /STRING adjusts a string by the number of characters given as the last input parameter. This input parameter is of data type INTEGER in order to allow signed positive, signed negative and unsigned numbers. But StrongForth provides an additional, overloaded version of /STRING without this parameter:
WORDS /STRING /STRING ( CDATA -> CHARACTER UNSIGNED -- 1ST 3RD ) /STRING ( CDATA -> CHARACTER UNSIGNED INTEGER -- 1ST 3RD ) OK
This second version of /STRING (which appears first in the above list) adjusts a string by always removing the first character, i. e., it assumes a default adjustment value of 1. ANS Forth, on the other hand, specifies only the version with an adjustment value. Similar to LSHIFT and RSHIFT, StrongForth takes advantage of its overloading capability by providing a special version for the most common usage of a word. Here's a simple example:
PAD 16 BLANK OK PAD 16 TYPE OK PARSE-WORD StrongForth PAD SWAP MOVE OK PAD 16 TYPE StrongForth OK PAD 16 5 /STRING OVER OVER TYPE gForth OK /STRING OVER OVER TYPE Forth OK -TRAILING TYPE Forth OK
This example leads to the next string processing word:
: -TRAILING ( CDATA -> CHARACTER UNSIGNED -- 1ST 3RD ) BEGIN DUP WHILE OVER OVER + 1- @ BL = WHILE 1- REPEAT THEN ;
The semantics is as specified by ANS Forth. -TRAILING removes trailing spaces from a string. The implementation contains a loop with two exit conditions, one for encountering a non-space character and one for the string being empty.
StrongForth provides three overloaded versions of the ANS Forth word COMPARE for different memory areas. At least one of the two strings to be compared has to be located in the DATA memory area:
COMPARE ( CDATA -> CHARACTER UNSIGNED CFAR-ADDRESS -> 2ND 3RD -- SIGNED ) COMPARE ( CDATA -> CHARACTER UNSIGNED CCONST -> 2ND 3RD -- SIGNED ) COMPARE ( CDATA -> CHARACTER UNSIGNED 1ST 3RD -- SIGNED )
SEARCH is an application of /STRING and COMPARE:
: SEARCH ( CDATA -> CHARACTER UNSIGNED 1ST 3RD -- 1ST 3RD FLAG ) LOCALS| N2 ADDR2 N1 ADDR1 | ADDR1 N1 BEGIN DUP N2 < INVERT WHILE OVER N2 ADDR2 N2 COMPARE WHILE /STRING REPEAT TRUE ELSE DROP DROP ADDR1 N1 FALSE THEN ;
Note that only strings located in the DATA memory area can be searched for substrings. The substring has to be located in the DATA memory area as well. However, is it very easy to define an overloaded version for substrings that are located in other memory areas, for example the CONST memory area:
: SEARCH ( CDATA -> CHARACTER UNSIGNED CCONST -> 2ND 3RD -- 1ST 3RD FLAG ) LOCALS| N2 ADDR2 N1 ADDR1 | ADDR1 N1 BEGIN DUP N2 < INVERT WHILE OVER N2 ADDR2 N2 COMPARE WHILE /STRING REPEAT TRUE ELSE DROP DROP ADDR1 N1 FALSE THEN ;
Except for the stack diagram, this definition is absolutely identical to the version for both strings in the DATA memory area. The compiler automatically chooses the right version of CONVERT, because it is aware of the data types.
Dr. Stephan Becher - January 4th, 2008