Templates

The PARSE, ARG, and PULL instructions accept a template that describes how assignments are to be made from an arbitrary string and the instructions expressed in the template with start initially 1. A formal definition is:
template
A set of directives that describe the assignments to be made from string
template := directive [ assignment ] [ template ]
directive
keydirectiveValue of symbol
1 (The space directive) All characters of string beginning with the next non-blank character from start and ending with the last non-blank character before the start defined by the next directive.
2n or =(varname)(The position directive) The nth position of string ending with the character prior to the start defined by the next directive, unless the next directive is the space, in which case the last non-blank character. (Note: this directive may be specified as =n)
3±m or ±(varname)(The adjust position directive) Characters of string beginning with the n±mth position, where n is defined by the previous directive and ending with the character prior to the start defined by the next directive (non-blank character if the next directive is the space directive.)
4quoted literal or (varname)(The match pattern directive) All characters of string beginning with the first after quoted literal and ending with the character prior to the start defined by the next directive (non-blank character if the next directive is the space directive.)
assignment
If assignment is specified, it is either a variable name or a period.
{variable name}
.

The assignment portion of a directiive assignment pair is optional, meaning that a valid template can include consecutive directives.

Examples

  1. The " " (space directive) just selects the blank-delimited words from string that occur before the next directive:
    PARSE VALUE "This is a sentence." WITH firstword secondword restofstring
    is equivalent to
    firstword = "This"; secondword = "is"; restofstring = "a sentence." and
    PARSE VAR restofstring thirdword fourthword restofstring
    is equivalent to
    thirdword = "a"; fourthword = "sentence."; restofstring = ""

    If there are more words in string than variables in the template, the last variable is assigned the "rest" of the string. If there are more symbols in the template than words in the string the excess symbols are assigned the null string value.

    NOTE: A construction such as PARSE VALUE ƒ1() ƒ2() … WITH word1 word2 … will not return the expected results if any of the ƒ() returns a result that includes a character that blank-delimits a "word." For example, if ƒ2() evaluates to a result that includes an embedded ASCII space, it evaluates to two words, so word3 gets the second and word4 is assigned the first word resulting from ƒ3().

    PARSE VALUE ƒ1() c2x(ƒ2()) … WITH word1 word2 …
    word2 = x2c(word2)
    gives the desired result if ƒ2() might return a binary value that coincidentally is a word-delimiter.

    Be especially aware that "blank-delimited" words may be defined by delimiters other than <SPACE>. In particular it is common for interpreters to include the <TAB> charachter. So word1<TAB><TAB> word3 would result in only two "words" being returned. To parse an ASCII line from a tab-delimited text file that might have "empty" column values, use

    tab = '09'x
    …
    PARSE VALUE line WITH field1 (tab) field2 (tab) …
    
    so that the match pattern directive is used instead of the space directive.

  2. A positive whole number (position directive) acts like the second argument of the SUBSTR(string,start,length) built-in function. Suppose record contains
    01-Sep-12 Colorado St                  Colorado                     Denver CO  
    ....'....1....'....2....'....3....'....4....'....5....'....6....'....7....'....8....'....9
    
    (The "ruler" line is not a part of the record, of course.)

    PARSE VAR record 1 filedate 10 . 11 visitor 40 home 69 location
    is the same as

    date = "01-Sep-12"; visitor = "Colorado St                  "; home = "Colorado                     "; location = "Denver CO             "
    start may be any positive value. In particular it can be left of the current position.
    PARSE VALUE 0 1 WITH zero one =1 x0 x1 =1 y0 y1
    is equivalent to the six assignments zero = 0; one = 1; x0 = 0; x1 = 1; y0 = 0; y1 = 1

  3. A whole number preceded by a + or − adjusts the position of start. With record as above:
    PARSE VAR record 1 filedate +9 . +2 visitor +29 home +29 location
    makes the same assignments as the template that explicitly specifies the columns that define the assignments.

    Note that start is incremented (or decremented) from its value prior to the assignment that precedes the directive. That is to say it is the start set by the immediately prior directive ±m characters (including spaces.)

  4. A literal (quoted string) defines a pattern that when matched sets start to the first character after the position of the pattern in string. Suppose that instead of fixed-length fields the record above uses the ASCI tab character ('09'x) as a field separator.
    PARSE VAR record dd '-' mmm '-' yy . '09'x visitor '09'x home '09'x location '09'x .
    defines the same variables as above except that instead of filedate the date is divided into it's constituent parts based upon the "matching" dashes. Note that the "." following yy makes the space preceding the "." the space directive, so yy does not include the trailing blank between "-12" and the next tab character.

Variable Template Examples

Directive types may be mixed within a template and any value set in a template can be used in subsequent assignments. For example, the sequence
 nextvalue = expression
 nexti = stem.0+1
 stem.nexti = nextvalue
 stem.0 = nexti
can be replaced with:
 PARSE VALUE stem.0+1 expression WITH nexti stem.nexti =1 stem.0 .

Directive types 2, 3 and 4 can themselves be specified as variables. For instance, if we have tab = X2C(09) the type 4 example could have been written as:

PARSE VAR record dd '-' mmm '-' yy . (tab) visitor (tab) home (tab) location (tab) .

PARSE VALUE string WITH =(start) substring +(length)
is identical to
substring = SUBSTR(string,start,length)
PARSE is also measurably faster except for trivially short strings.


The variable name enclosed in parentheses may be defined in the template itself. For instance, if qstring is /delimited string/ then

PARSE VAR qstring =1 delim =2 qstring2 (delim)
sets qstring2 to qstring with the leading and trailing "/" characters removed.

Suppose record contains

"01-Sep-12",Colorado St,Colorado,"Denver, CO"
The following loop separates this "comma separated values" line into its component fields (saved as a stem variable) and removes any quotations used to preserve embedded commas.
 q = "'"; qq = '"'
 field. = ''; fix = 0
 DO WHILE record \= ''
     PARSE VAR record =1 tst +1 . /* tst=1st char of (next) field */ 
     IF POS(tst,q qq) \= 0 THEN DO
         tstend = tst','
         PARSE VAR record =2 nxtfield (tstend) record
         /* nxtfield is what is between " and ", */
     END
     ELSE PARSE VAR record nxtfield ',' record
         /* nxtfield is what is between , and , */
     PARSE VALUE fix+1 nxtfield WITH fix field.fix
 END
 field.0 = fix
and when applied to record results in
 field.0 = 4
 field.1 = "01-Sep-12"
 field.2 = "Colorado St"
 field.3 = "Colorado"
 field.4 = "Denver, CO"

"Destructive" PARSE

Several of the examples above (especially the last one) demonstrate that the input string can be assigned a new value by the template. One of the most useful forms of the simplest template is a replacement of the WORD() built-in function. Accessing a stem variable word.i is so much faster than finding WORD(words, i) that is worth the cost of using PARSE to save words as a stem.
 n = WORDS(words)
 DO ix = 1 FOR n
     PARSE VAR words word.ix words 
 END
 word.0 = n
At the end of the loop words is the null string, but word.i is what WORD(words, i) would have returned.

See WORD() vs PARSE for a relative performance analysis and generalization of this example to use arbitrary delimiters.

For more examples see Parallel Assignments.