REXX performance: WORD() vs PARSE

WORD() vs PARSE

As we can see from this graph

 DO i = 1 FOR WORDS(lotsofwords)
     nextword = WORD(lotsofwords,i)
     …
 END

takes ten times as long as

 DO WHILE lotsofwords \= ''
     PARSE VAR lotsofwords nextword lotsofwords
     …
 END

Run-time for both routines increases with the square of the number of words. The second one is more than 10 thousand times more expensive than

 DO i = 1 FOR wordstem.0
     nextword = wordstem.i
     …
 END

for which the run-time increases linearly with the number of words. For 556019 words, destructive PARSE took a little over three million milliseconds and to access 556019 stem variables took 194 ms.

So instead of a list, create an array using a stem variable and the numeric convention. When it is necessary to use the list form but you need to process the list twice, use destructive PARSE to process the list and save the intermediate results as an array:

 DO ix = 1 WHILE lotsofwords \= ''
     PARSE VAR lotsofwords word.ix lotsofwords
     …
 END
 word.0 = ix - 1 /* adjust for the last increment of the WHILE loop */

I found myself doing that so often that I externalized the routine as 'indexlist'('stem.', list [,delim]) with the bonus that it works on lists with arbitrary delimiters, not just words. It only works as written under Regina, and there is one non-essential Windows dependency.

/* turn a delimited list into an indexed array *****************************

  CALL indexlist stemname,list[,delim]
    or
  symbol = indexlist(stemname,list[,delim])
 
 Examples:
  call indexlist 'word.',wordlist
         or
  numwords = indexlist('word.',wordlist)
 sets word.0 to words(wordlist), word.1 to 1st word, word.2 to 2nd word, etc.

  call indexlist 'item.',itemlist,'09'x
        or
  numitems = indexlist('item.',itemlist,'09'x)
 sets item.0 to the number of tab-delimted items in the list, item.1 to 1st item, item.2 to 2nd item, etc.

indexlist: procedure /* (make this the first statement to include as internal subroutine) */

**************************************************************************/

 parse source . cmd? myname 
 z = lastpos('\',myname)+1    /*** Windows specific ***/
 parse var myname =(z) myname '.'
 parse version z
 regina? = sign(pos('Regina',z))
 if \regina? then do
    say 'Feature' myname 'not available.' /* We need the poolid() bif and value() extension */
    return -1 /* raises syntax error if result is used as a word-related argument  */
   end
 mypool = poolid()
 if cmd? = 'COMMAND' & mypool = 1 then do
     say 'Incorrect call to' myname', cannot be invoked as COMMAND'
     return -1
   end
 stem = arg(1)
 list = arg(2)
 delim = arg(3) /* may be null */
 callerpool = mypool-1 /* this is the Regina convention. 
                         It applies to internal functions and CALLed subroutines 
                         whether or not there's a PROCEDURE statement */
 if delim \== '' then do ix=1 while list \= ''
     parse var list next (delim) list
     z = value(stem||ix,next,callerpool)
   end /* ix */
 else do ix=1 while list \= ''
     parse var list next list
     z = value(stem||ix,next,callerpool)
   end /* ix */
 ix = ix-1 /* undo the last increment in DO WHILE */
 z = value(stem'0',ix,callerpool)
 return ix