|Powered by QM on a Rpi server|
KnowledgeBase 00088: Sort Configuration
This article was originally published as a Tip of the Week.
The QM sort system underlies all query processor operations that include a sort clause, construction of alternate key indices, the QMBasic SSELECT, SSELECTN and SSELECTV statements, and the !SORT() subroutine.
There are three configuration parameters that control the sorting system. This article explains what they do but, as described below, determination of the best values is somewhat dependent on the underlying hardware.
How Sorting Works
The sort process begins by building a binary tree that reflects the sort order of the data items. This mechanism supports multiple keys and the different rules applied to left and right justified sorts. There is also an optimisation to detect that the incoming data is already sorted (or perhaps just largely so), as this would otherwise lead to inefficient construction of a very one-sided tree.
When the sort tree reaches a certain size, it is flushed out to disk and a new sort tree is built for further items. This process will be repeated as many times as necessary, resulting in a set of disk based sort trees. A small sort that never reaches the size at which it would be flushed to disk is handled entirely in memory.
The Merge Phase
Once all the data has been written to disk as a set of separate sort trees, they must be merged to yield a single tree. This process merges a fixed number of trees at one time to form a single larger tree. Having processed all of the original sort trees, the larger composite trees are merged again in exactly the same way. This process continues until there is just one sort tree that contains the entire set of data.
The SORTWORK Configuration Parameter
The SORTWORK configuration parameter determines the pathname of the directory where the disk sort trees are stored. For best performance, this should be on a different disk from the data file from which the tree is being built.
The sort work files are normally deleted automatically as they are merged or when the entire sort operation has completed, however, it is recommended that the sort work directory should also be cleared on system boot to ensure that any files left behind after a force logout are deleted. If this is not done, they will be deleted when the same work file name is needed again by a later sort operation.
The SORTMEM Configuration Parameter
The SORTMEM configuration parameter specifies the approximate size in kb at which a sort tree is written to disk. The default value of 1024 (1Mb) works well for most systems but it can be enlarged to work with larger memory based trees. The decision of an appropriate value is complex as it is a balance between large trees that take longer to traverse in memory and small trees that require more merge phases. It is likely that the comparative speeds of the disk system and processor play a part in choosing optimum values.
The SORTMRG Configuration Parameter
The SORTMRG configuration parameter determines how many disk based sort trees are merged in each pass. It must be in the range 2 to 10 and the default value is 4. Again, the relative speed of the processor and disk may be relevant in choosing an appropriate value. A large value requires more comparisons for each sort data item but results in fewer merge phases.
There is no easy way to select the optimum values for SORTMEM and SORTMRG. It is often best to experiment with alternative values on a realistically loaded system. Note that these are private configuration parameters that can be changed on a per-process basis with the CONFIG command.