- Introduction
This document has been prepared by J.L.Schonfelder as an informal description of a module written to implement the functionality defined by the standard ISO/IEC 1539-2 : 1999. This standard is an auxiliary standard to ISO/IEC 1539-1 : 1997,
which defines the latest revision of the Fortran language. The revised language
defined by ISO/IEC 1539-1 : 1997 is informally known as Fortran 95.
The part 2 standard defines the interface and semantics for a module that provides facilities
for the manipulation of character strings of arbitrary and dynamically variable length. This web document provides a description of a specific implementation, in Fortran 95, of a module that conforms to this part 2 standard. A link is included in the document to the source code of this module. The module is a fairly straightforward demonstration that a portable implementation of this varying length character string facility is possible. This module implementation has a semi-official status as it was produced as part of the process of producing the standard. The original version of the module source was made available via the web site of the relevant standard committee.
The version of the module made available via the link from this document is actively maintained by the author and will include corrections and improvements as and when these are produced. Users of the module are asked to report and errors found to the author. Also, if a user makes any change to the implementation, a improved algorithm or an improved realisation of the existing algorithm, these should be sent to the author who will if appropriate incorporate these in subsequent releases of the module.
The source code of the MODULE ISO_VARYING_STRING is to be found here.
An example, PROGRAM vocabulary_word_count, of use of this module is given here. This program produce a vocabulary list with usage frequencies from an arbitrary text file.
The name of the module is
ISO_VARYING_STRING
and it can be accessed by the inclusion
of USE statements referencing this module, for example
USE ISO_VARYING_STRING
The module defines a type named
VARYING_STRING
Entities of this type shall represent values that are strings of characters of default kind. These
character strings may be of any non-negative length and this length may vary dynamically during
the execution of a program. There is no arbitrary upper length limit other than that imposed
by the size of the processor and the complexity of the programs it is able to process. The characters
representing the value of the string have positions 1,2,...,N, where N is the length of the string. The
internal structure of the type is PRIVATE to the module.
The meanings for the intrinsic operators of:
assignment =
comparisons ==, /=, <, <=, >=, >
concatenation //
is extended to accept any combination of operands of type VARYING_STRING and type
CHARACTER. All of these operators are extended elementally for conformant arrays as well as for scalars. Note that the equivalent comparison operator forms .EQ., .NE., .LT., .LE., .GE., and
.GT. also have their meanings extended in this manner.
An assignment of the form
var = expr
is defined for scalars and elementally for conformant arrays with the following type combinations:
VARYING_STRING = VARYING_STRING
VARYING_STRING = CHARACTER
CHARACTER = VARYING_STRING
Action: The characters that are the value of the expression expr become the value
of the variable var. There are two cases:
Case(i): Where the variable is of type VARYING_STRING, the length of the
variable becomes that of the expression.
Case(ii): Where the variable is of type CHARACTER, the rules of
intrinsic assignment to a Fortran character variable apply.
Namely, if the expression string is longer than the declared
length of the character variable, only the left-most
characters are assigned. If the character variable is longer
than that of the string expression, it is padded on the right
with blanks.
Comparisons of the form
string_a .OP. string_b
where .OP. represents any of the operators ==, /=, <, <=,
>=, or > are defined
for scalar or elementally for conformant array operands with the following type combinations:
VARYING_STRING .OP. VARYING_STRING
VARYING_STRING .OP. CHARACTER
CHARACTER .OP. VARYING_STRING
The values of the operands are unchanged by the operation.
Note that the equivalent operator forms .EQ., .NE., .LT., .LE.,
.GE., and .GT. also
have their meanings extended in this manner.
Result Attributes: scalar or array of type default LOGICAL.
Result Value: The result value is true if string_a stands in the indicated relation to
string_b and is false otherwise. The collating sequence used for the inequality
comparisons is that defined by the processor for characters of default kind. If
string_a and string_b are of different lengths, the comparison is done as if the
shorter string were padded on the right with blanks.
The concatenation operation
string_a // string_b
is defined for scalars and elementally for conformant arrays with the following type combinations:
VARYING_STRING // VARYING_STRING
VARYING_STRING // CHARACTER
CHARACTER // VARYING_STRING
The values of the operands are unchanged by the operation.
Result Attributes: scalar or array of type VARYING_STRING.
Result Value: The result value is a new string whose characters are the same as those
produced by concatenating the operand character strings in the order given.
The generic intrinsic procedures ADJUSTL, ADJUSTR, CHAR, the comparison procedures (LLT, LLE, LGE, LGT), IACHAR, ICHAR, INDEX,
LEN, LEN_TRIM, REPEAT, SCAN, TRIM, and VERIFY
have their meanings extended to include
the appropriate argument type combinations involving VARYING_STRING and CHARACTER. All of these functions are extended elementally for conformant arrays as well as scalars except the CHAR function which remains restricted to scalar arguments.
The generic function reference of the form
ADJUSTL(string)
is added. This version of the ADJUSTL function is ELEMENTAL.
Description: adjusts to the left, removing any leading blanks and inserting trailing
blanks.
Argument: string is a scalar or an array of type VARYING_STRING.
The argument is unchanged
by the procedure.
Result Attributes: scalar or an array of type VARYING_STRING.
Result Value: The result value is the same as string except that any leading blanks
have been deleted and the same number of trailing blanks inserted.
The generic function reference of the form
ADJUSTR(string)
is added. This version of the ADJUSTR function is ELEMENTAL.
Description: adjusts to the right, removing any trailing blanks and inserting leading
blanks.
Argument: string is a scalar or an array of type VARYING_STRING.
The argument is unchanged
by the procedure.
Result Attributes: scalar or an array of type VARYING_STRING.
Result Value: The result value is the same as string except that any trailing blanks
have been deleted and the same number of leading blanks inserted.
The generic function references of the form
CHAR(string)
CHAR(string,length)
is added. The these versions of the CHAR function are PURE.
Description: converts a varying string value to default character.
Arguments:
string - is a scalar of type VARYING_STRING
length - is a scalar of type default INTEGER.
The arguments are unchanged by the procedure.
Result Attributes: scalar of type default CHARACTER. If length
is absent, the result
has the same length as string. If length is present, the result has the length
specified by the argument length.
Result Value:
Case(i): If length is absent, the result is a copy of the characters in the
argument string.
Case(ii): If length is present, the result is a copy of the characters
in the argument string that may have been truncated or
padded. If string is longer than length, the result is
truncated on the right. If string is shorter than length, the
result is padded on the right with blanks. If length is less
than one, the result is of zero length.
The set of generic function references of the form
Lop(string_a,string_b)
is added, where op stands for one of:
LT - less than
LE - less than or equal to
GE - greater than or equal to
GT - greater than
These versions of the Lop functions are ELEMENTAL.
Description: compares the lexical ordering of two strings based on the ISO 646 : 1991
(ASCII) collating sequence.
Arguments: string_a and string_b are scalars or arrays of one of the
type combinations:
VARYING_STRING and VARYING_STRING
VARYING_STRING and CHARACTER
CHARACTER and VARYING_STRING
The arguments are unchanged by the procedure.
Result Attributes: scalar or array of type default LOGICAL.
Result Value: The result value is true if string_a stands in the indicated relationship
to string_b, and is false otherwise. The collating sequence used to establish the
ordering of characters for these procedures is that of the International Standard ISO
646 : 1991. If string_a and string_b are of different lengths, the comparison is
done as if the shorter string were padded on the right with blanks. If either argument
contains a character c not defined by the standard, the result value is processor
dependent and based on the collating value for IACHAR(c). Zero length strings are
considered to be lexically equal.
The generic function reference of the form
IACHAR(c)
is added. This version of the IACHAR function is ELEMENTAL.
Description: returns the position of a character in the collating sequence defined by
the International Standard ISO 646 : 1991.
Argument: c is a scalar or an array of type VARYING_STRING and of length
exactly one. The
argument is unchanged by the procedure.
Result Attributes: scalar or an array of type default INTEGER.
Result Value: The result value is the position of the character c in the collating
sequence defined by the International Standard ISO 646 : 1991 for default characters.
If the character c is not defined in the standard set, the result is processor
dependent but is always equal to IACHAR(CHAR(c).
The generic function reference of the form
ICHAR(c)
is added. This version of the ICHAR function is ELEMENTAL.
Description: returns the position of a character in the processor defined collating
sequence.
Argument: c is a scalar or an array of type VARYING_STRING and of length
exactly one. The
argument is unchanged by the procedure.
Result Attributes: scalar or an array of type default INTEGER.
Result Value: The result value is the position of the character c in the processor
defined collating sequence for default characters. That is, the result value is
ICHAR(CHAR(c)).
The generic function reference of the form
INDEX(string,substring,back)
is added. These versions of the INDEX function are ELEMENTAL.
Description: returns an integer that is the starting position of a substring within a
string.
Arguments: string and substring are scalars or arrays of one of the
type combinations:
VARYING_STRING and VARYING_STRING
VARYING_STRING and CHARACTER
CHARACTER and VARYING_STRING
back - is a scalar or array of type default LOGICAL and is OPTIONAL.
The arguments are unchanged by the procedure.
Result Attributes: scalar array of type default INTEGER.
Result value:
Case(i): If back is absent or is present with the value false, the result is the
minimum positive value of I such that,
EXTRACT(string,I,I+LEN(substring)-1)==substring,
or zero if there is no such value.
Zero is returned if LEN(string)<LEN(substring), and one
is returned if LEN(substring)==0.
Case(ii): If back is present with the value true, the result is the
maximum value of I less than or equal to
LEN(string)-LEN(substring)+1 such that
EXTRACT(string,I,I+LEN(substring)-1)==substring,
or zero if there is no such value.
Zero is returned if LEN(string)<LEN(substring), and
LEN(string)+1 is returned if LEN(substring)==0.
The generic function reference of the form
LEN(string)
is added. This version of the LEN function is ELEMENTAL.
Description: returns the length of a character string.
Argument: string is a scalar of type VARYING_STRING. The argument is unchanged
by the procedure.
Result Attributes: scalar of type default INTEGER.
Result Value: The result value is the number of characters in string.
The generic function reference of the form
LEN_TRIM(string)
is added. This version of the LEN_TRIM function is ELEMENTAL.
Description: returns the length of a string not counting any trailing blanks.
Argument: string is a scalar or an array of type VARYING_STRING.
The argument is unchanged
by the procedure.
Result Attributes: scalar or array of type default INTEGER.
Result Value: The result value is the position of the last non-blank character in
string. If the argument string contains only blank characters or is of zero length,
the result is zero.
The generic function reference of the form
REPEAT(string,ncopies)
is added. This version of the REPEAT function is ELEMENTAL
Description: concatenates several copies of a string.
Arguments:
string - is a scalar or array of type VARYING_STRING,
ncopies - is a scalar or array of type default INTEGER.
The arguments are unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is the string produced by repeated concatenation of the
argument string, producing a string containing ncopies copies of string. If
ncopies is less than or equal to zero, the result is of zero length.
The generic function reference of the form
SCAN(string,set,back)
is added. These versions of the SCAN function are ELEMENTAL.
Description: scans a string for any one of the characters in a set of characters.
Arguments: string and set are scalars or arrays of one of the
type combinations:
VARYING_STRING and VARYING_STRING
VARYING_STRING and CHARACTER
CHARACTER and VARYING_STRING
back - is a scalar or array of type default LOGICAL and is OPTIONAL.
The arguments are unchanged by the procedure.
Result Attributes: scalar or array of type default INTEGER.
Result Value:
Case(i): If back is absent or is present with the value false and if string
contains at least one character that is in set, the value of the result
is the position of the left-most character of string that is in set.
Case(ii): If back is present with the value true and if string contains
at least one character that is in set, the value of the result
is the position of the right-most character of string that
is in set.
Case(iii): The value of the result is zero if no character of string is
in set or if the length of either string or set is zero.
The generic function reference of the form
TRIM(string)
is added. This version of the TRIM function is ELEMENTAL.
Description: removes trailing blanks from a string.
Argument: string is a scalar of type VARYING_STRING. The argument is unchanged
by the procedure.
Result Attributes: scalar or array of type VARYING_STRING.
Result Value: The result value is the same as string except that any trailing blanks
have been deleted. If the argument string contains only blank characters or is of
zero length, the result is a zero-length string.
The generic function reference of the form
VERIFY(string,set,back)
is added. These versions of the VERIFY function are ELEMENTAL.
Description: verifies that a string contains only characters from a given set by
scanning for any character not in the set.
Arguments: string and set are scalars or arrays of one of the
type combinations:
VARYING_STRING and VARYING_STRING
VARYING_STRING and CHARACTER
CHARACTER and VARYING_STRING
back - is a scalar or array of type default LOGICAL and is OPTIONAL.
The arguments are unchanged by the procedure.
Result Attributes: scalar or array of type default INTEGER.
Result Value:
Case(i): If back is absent or is present with the value false and if string
contains at least one character that is not in set, the value of the
result is the position of the left-most character of string that is
not in set.
Case(ii): If back is present with the value true and if string contains
at least one character that is not in set, the value of the
result is the position of the right-most character of string
that is not in set.
Case(iii): The value of the result is zero if each character of string is
in set or if the length of string is zero.
An additional elemental generic procedure is added to convert intrinsic fixed-length character
values into varying-length string values.
The generic function reference of the form
VAR_STR(char)
is provided. This function is ELEMENTAL.
Description: converts an intrinsic fixed-length character value into the equivalent
varying-length string value.
Argument: char is a scalar or array of type default CHARACTER and may
be of any length. The
argument is unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is the same string of characters as the argument.
The following additional generic procedures is provided to support input and output of
varying-length string values with formatted sequential files.
GET - input part or all of a record into a string
PUT - append a string to an output record
PUT_LINE - append a string to an output record and end the record
The generic subroutine references of the forms
CALL GET(string,maxlen,iostat)
CALL GET(unit,string,maxlen,iostat)
CALL GET(string,set,separator,maxlen,iostat)
CALL GET(unit,string,set,separator,maxlen,iostat)
are provided.
Description: reads characters from an external file into a string.
Arguments:
string - is of type VARYING_STRING,
maxlen - is of type default INTEGER and is OPTIONAL,
unit - is of type default INTEGER,
set - is either of type VARYING_STRING or of type CHARACTER,
separator - is of type VARYING_STRING and is OPTIONAL,
iostat - is of type default INTEGER and is OPTIONAL.
All arguments are scalar. The argument unit specifies the input unit to be used. It
must be connected to a formatted file for sequential read access. If the argument
unit is omitted, the default input unit is used. The arguments maxlen, unit,
and set are unchanged by the procedure.
Action: The GET procedure causes characters from the connected file, starting with
the next character in the current record if there is a current record or the first
character of the next record if not, to be read and stored in the variable string. The
end of record always terminates the input but input may be terminated before this.
If maxlen is present, its value indicates the maximum number of characters that will
be read. If maxlen is less than or equal to zero, no characters will be read and
string will be set to zero length. If maxlen is absent, a maximum of HUGE(1) is used.
If the argument set is provided, this specifies a set of characters the occurrence of
any of which will terminate the input. This terminal character, although read from
the input file, will not be included in the result string. The file position after the data
transfer is complete, is after the last character that was read. If the argument
separator is present, the actual character found which terminates the transfer is
returned in separator. If the transfer is terminated other than by the occurrence
of a character in set, a zero length string is returned in separator. If the transfer
is terminated by the end of record being reached, the file is positioned after the
record just read. If present, the argument iostat is used to return the status
resulting from the data transfer. A zero value is returned if a valid read operation
occurs and the end-of-record is not reached, a positive value if an error occurs, and
a negative value if an end-of-file or end-of-record condition occurs. Note, the
negative value returned for an end-of-file condition must be different from that
returned for an end-of-record condition. If iostat is absent and an error or
end-of-file condition occurs, the program execution is terminated.
The generic subroutine references of the forms
CALL PUT(string,iostat)
CALL PUT(unit,string,iostat)
are provided.
Description: writes a string to an external file.
Arguments:
string - is either of type VARYING_STRING or type CHARACTER,
unit - is of type default INTEGER,
iostat - is of type default INTEGER and is OPTIONAL.
All arguments are scalar. The argument unit specifies the output unit to be used. It
must be connected to a formatted file for sequential write access. If the argument
unit is omitted, the default output unit is used. The arguments unit and string are
unchanged by the procedure.
Action: The PUT procedure causes the characters of string to be appended to the
current record, if there is a current record, or to the start of the next record if
there is no current record. The last character transferred becomes the last character
of the current record, which is the last record of the file. If present, the argument
iostat is used to return the status resulting from the data transfer. A zero value
is returned if a valid write operation occurs, and a positive value if an error occurs.
If iostat is absent and anything other than a valid write operation occurs, the
program execution is terminated.
The generic subroutine references of the forms
CALL PUT_LINE(string,iostat)
CALL PUT_LINE(unit,string,iostat)
are provided.
Description: writes a string to an external file and ends the record.
Arguments:
string - is either of type VARYING_STRING or type CHARACTER
unit - is of type default INTEGER
iostat - is of type default INTEGER and is OPTIONAL.
All arguments are scalar. The argument unit specifies the output unit to be used. It
must be connected to a formatted file for sequential write access. If the argument
unit is omitted, the default output unit is used. The arguments unit and string are
unchanged by the procedure.
Action: The PUT_LINE procedure causes the characters of string to be appended to
the current record, if there is a current record, or to the start of the next record
if there is no current record. Following completion of the data transfer, the file is
positioned after the record just written, which becomes the previous and last record
of the file. If present, the argument iostat is used to return the status resulting
from the data transfer. A zero value is returned if a valid write operation occurs, and
a positive value if an error occurs. If iostat is absent and anything other than a
valid write operation occurs, the program execution is terminated.
The following additional elemental generic procedures are provided to support the manipulation of
substrings of varying-length strings.
INSERT - insert a substring into a string
REPLACE - replace a substring in a string
REMOVE - remove a section of a string
EXTRACT - extract a section from a string
SPLIT - split a string into two at the occurrence of a separator
The generic function reference of the form
INSERT(string,start,substring)
is provided. This version of the INSERT function is ELEMENTAL.
Description: inserts a substring into a string at a specified position.
Arguments:
string - is either type VARYING_STRING or type default CHARACTER,
start - is type default INTEGER,
substring - is either type VARYING_STRING or type default CHARACTER.
All arguments are scalars. The arguments are unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is a copy of the characters of the argument string
with the characters of substring inserted into the copy of string before the
character at the character position start. If start is greater than LEN(string, the
value LEN(string)+1) is used for start and substring is appended to the copy of
string. If start is less than one, the value one is used for start and substring is
inserted before the first character of the copy of string.
The generic function references of the forms
REPLACE(string,start,substring)
REPLACE(string,start,finish,substring)
REPLACE(string,target,substring,every,back)
are provided. This version of the REPLACE function is ELEMENTAL.
Description: replaces a subset of the characters in a string by a given substring. The
subset may be specified either by position or by content.
Arguments:
string - is either of type VARYING_STRING or type default CHARACTER,
start - is of type default INTEGER,
finish - is of type default INTEGER,
substring - is either of type VARYING_STRING or type default CHARACTER,
target - is either of type VARYING_STRING or type default CHARACTER,
every - is of type default LOGICAL, and is OPTIONAL,
back - is of type default LOGICAL, and is OPTIONAL.
All arguments are scalar. The argument target must not be of zero length. In all
cases the arguments are unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is a copy of the characters in string modified as per
one of the cases below.
Case(i): For a reference of the form
REPLACE(string,start,substring)
the characters of the argument substring are inserted into
the copy of string beginning with the character at the
character position start. The characters in positions from
start to MIN(start+LEN(substring)-1,LEN(string))
are deleted. If start is greater than LEN(string), the value
LEN(string)+1 is used for start and substring is
appended to the copy of string. If start is less than one,
the value one is used for start.
Case(ii): For a reference of the form
REPLACE(string,start,finish,substring)
the characters in the copy of string between positions
start and finish, including those at start and finish, are
deleted and replaced by the characters of substring. If
start is less than one, the value one is used for start. If
finish is greater than LEN(string, the value LEN(string)
is used for finish. If finish is less than start, the
characters of substring are inserted before the character
at start and no characters are deleted.
Case(iii): For a reference of the form
REPLACE(string,target,substring,every,back)
the copy of string is searched for occurrences of target.
The search is done in the backward direction if the argument
back is present with the value true, and in the forward
direction otherwise. If target is found, it is replaced by
substring. If every is present with the value true, the
search and replace is continued from the character following
target in the search direction specified until all occurrences
of target in the copy string are replaced; otherwise only the
first occurrence of target is replaced.
The generic function reference of the form
REMOVE(string,start,finish)
is provided. This version of the REMOVE function is ELEMENTAL.
Description: removes a specified substring from a string.
Arguments:
string - is either of type VARYING_STRING or type default CHARACTER,
start - is of type default INTEGER, and is OPTIONAL,
finish - is of type default INTEGER, and is OPTIONAL.
All arguments are scalars. The arguments are unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is a copy of the characters of string with the
characters between positions start and finish, inclusive, removed. If start is
absent or less than one, the value one is used for start. If finish is absent or
greater than LEN(string, the value LEN(string) is used for finish.
If finish is
less than start, the characters of string are delivered unchanged as the result.
The generic function reference of the form
EXTRACT(string,start,finish)
is provided. This version of the EXTRACT function is ELEMENTAL.
Description: extracts a specified substring from a string.
Arguments:
string - is either of type VARYING_STRING or type default CHARACTER,
start - is of type default INTEGER, and is OPTIONAL,
finish - is of type default INTEGER, and is OPTIONAL.
All arguments are scalars. The arguments are unchanged by the procedure.
Result Attributes: scalar of type VARYING_STRING.
Result Value: The result value is a copy of the characters of the argument string
between positions start and finish, inclusive.
If start is absent or less than one, the value one is used for start.
If finish is absent or greater than LEN(string),
the value LEN(string is used for finish.
If finish is less than start, the result
is a zero-length string.
The generic subroutine reference of the form
CALL SPLIT(string,word,set,separator,back)
is provided. This version of the SPLIT subroutine is ELEMENTAL.
Description: splits a string into a two substrings with the substrings separated by the
occurrence of a character from a specified separator set.
Arguments:
string - is of type VARYING_STRING,
word - is of type VARYING_STRING,
set - is either of type VARYING_STRING or type default CHARACTER,
separator - is of type VARYING_STRING, and is OPTIONAL,
back - is of type default LOGICAL, and is OPTIONAL,
All arguments are scalar. The arguments set and back are unchanged by the
procedure.
Action: The effect of the procedure is to divide the string at the first occurrence of
a character that is in set. The string is searched in the forward direction unless
back is present with the value true, in which case the search is in the backward
direction. The characters passed over in the search are returned in the argument
word and the remainder of the string, not including the separator character, is
returned in the argument string. If no character from set is found
or set is of zero
length, the whole string is returned in word and string is returned as zero length.
If the argument separator is present, the actual character found which separates
the word from the remainder of the string is returned in separator. The effect of
the procedure is such that, on return, either
word//separator//string
is the same as the initial string for a forward search, or
string//separator//word
is the same as the initial string for a backward search.