Using mental ray

Rendering images with mental ray is straightforward, but a deeper knowledge of usage issues can lead to a better exploitation of available computing resources and more sophisticated control of image quality.

Controlling Rendering Speed and Memory Usage

mental ray offers two independent rendering algorithms, ray classification and binary space partitioning (BSP), that both have their unique advantages. The algorithm is user-selectable using command-line options or a statement in the scene file.

Ray Classification

This algorithm is enabled with -acceleration ray_classification on the command line or with acceleration ray classification in the view statement in the .mi scene file.

The algorithm employs completely novel rendering acceleration algorithms which offer increased performance while using extremely little memory for rendering acceleration data structures. When rendering large scenes, memory consumption is unavoidably dominated by the scene stored in memory. The amount of memory used for acceleration structures is constant and defaults to approximately 6 megabytes per CPU. Even very large scenes with over a million triangles work at maximum speed with no more than 12 megabytes per CPU for acceleration data structures.

Primitives (triangles) are first gathered into bounding boxes which are used in all further acceleration techniques. mental ray uses heuristics to balance the number of boxes and the number of triangles in each box; this is done by predicting and minimizing the cost of evaluating a box during rendering. Boxes are constructed by recursive partitioning of the input groups.

The space of rays is subdivided. It is this ray classification which is responsible for most of the acceleration. The optimal subdivision is estimated by mental ray, but may be adjusted with the -subdivision command line option, or with the subdivision statement in the view. Two correction constants can be specified, for the space of all rays, and for eye and shadow ray classification. Positive numbers subdivide more finely, while negative numbers generate fewer subdivisions. With finer subdivisions, the number of boxes that must be examined when a ray is cast is reduced, but the overhead for generating the acceleration data structures and intersection testing increases. Normally, small adjustments in the range (-2, 2) suffice for optimal speed.

Binary Space Partitioning (BSP)

This algorithm is enabled with -acceleration spatial_subdivision on the command line or with acceleration spatial subdivision in the view statement in the .mi scene file.

This acceleration algorithm operates by building a hierarchical spatial data structure by recursively subdividing a bounding volume surrounding the entire scene. The resulting binary tree consists of branch nodes that correspond to a subdivision of a bounding volume (voxel) in two subvoxels and leaf nodes that contain the geometric primitives (triangles).

The form of the tree is characterized by two parameters:

  • -max_size on the command line or max size in the view statement in the .mi scene file controls the maximum number of primitives in a leaf. The argument is an integer. The default is 4.

  • -max_depth on the command line or max depth in the view statement in the .mi scene file controls the maximum number of levels of the tree, thus limiting how finely a branch of the tree can be subdivided. The argument is an integer. The default is 24.

There is a trade-off between speed and memory consumption. Memory usage of the BSP algorithm implicitly depends on the parameters. As opposed to ray classification, an a priori estimate for the amount of memory needed for building the acceleration data structures is not available. The subdivision memory limit available with ray classification has no effect. Large leaf sizes and small tree depths reduce memory usage but increase rendering time because larger leafs need to be searched. Increasing the tree depth also leads to a slight increase of the preprocessing time required for building the acceleration data structure.

Rendering speed can often be fine tuned by experimenting with maximum leaf sizes of 1 or 10, and maximum tree depths of 20 or 27. When rendering with the -verbose on option, mental ray reports the size of the largest leaf node (before rendering) and the number of candidate triangles per ray (after rendering). If these numbers are much larger than 10, one should try to build a deeper BSP tree by choosing a larger maximum depth such as 30 or higher. This can have a dramatic effect on rendering speed. When there are more triangles than can be stored in the BSP tree of the specified depth and leaf size, mental ray will ignore the maximum leaf size, so large leaves can result even if max_size is set to a small number.

In general, the BSP-algorithm often leads to faster rendering times, in particular in complex scenes with many reflections and refractions and for scenes with motion blur. However, for very large scenes, or for motion-blurred scenes where large portions of the scene move very rapidly, ray classification may be preferrable because the amount of memory required by this algorithm can be controlled in a convenient way.

Independently of the rendering algorithm used, the total amount of available memory may be specified in megabytes on the command line with the -memory option, or with the memory statement in the input file. This information is used to limit the maximum resident set size of the process. This does not limit the total amount of virtual memory consumed, but in an environment where multiple processes are competing for limited main memory, this will restrict the amount of main memory which mental ray may occupy. If a machine is not overly loaded and more memory is available to mental ray, it will use this in any case. Reducing the resident set may lead to increased paging to disk.

Controlling Image Complexity and Quality

Rendering is always an expensive process, and a complex image with hundreds of thousands of polygons, reflections, refractions and shadows can require a great deal of processing time. There are a number of different ways of reducing image complexity and hence increasing rendering speed. The simplest of these involve command line options, other approaches require changing the input data.

The most obvious way to obtain an image more quickly is to decrease the resolution. A possible alternative to this for test images is to render only a portion of the image using the -window option or the window statement in the view statement of the input file. (see view)

If supersampling is being done, the number of samples per pixel may be reduced. A somewhat more subtle adjustment is to increase the contrast tolerance for adaptive supersampling. This requires a careful examination of the resulting images to determine an acceptable tolerance, but can make a significant difference in rendering time, especially if the number of samples is high.

The rendering time is closely related to the total number of rays cast. Physically, a ray may be reflected or refracted an infinite number of times, but in practice it is often acceptable to place a small upper limit on the number of reflections or refractions. mental ray allows these limits to be set separately and allows a limit to be placed on the sum of the number of reflections and refractions with either the -trace_depth option on the command line or the trace depth statement in the view statement in the input file. (see view)

On occasion it may be desirable to disable ray tracing altogether. This may be done by setting -trace_off on the command line or placing the statement trace off in the view statement of the input file. Also, it is possible to disable the shadow computations by setting -shadow off on the command line or placing the statement shadow off in the view statement of the input file.

A final command line option which can be used to reduce image complexity is the -clip option,s which can also be present as the clip statement in the view statement. (see view) By default, the near clipping plane is very close to the camera, and the far clipping plane is far away. However, if an image contains objects which are complex but very far away, it may be desirable to set the yon clipping plane so that such objects are not rendered.

Important information for performance tuning of mental ray can be obtained from the diagnostic messages. These messages and their use in performance tuning is explained in section ``Understanding Diagnostic Messages'' below.

Controlling Scene Complexity

In addition to changing global rendering parameters, rendering speed can be improved by modifying the scene without sacrificing image quality. This can be achieved by setting certain flags on individual objects.

The surface appearance of a polygon is determined by its material, which uses any one of a number of different built-in shaders, or a user-defined shader. (see material shader) The built-in shaders, which correspond to SOFTIMAGE constant, Lambert, Phong, and Blinn shaders, and Wavefront illumination models zero through nine, vary considerably in their complexity and cost. Wavefont's illumination model zero, for example, simply sets the color of the polygon to a given value and is obviously quite cheap. Illumination model seven, on the other hand, is quite complex, with a Fresnel specular computation, shadowing, Fresnel reflection, and refraction.

If the input contains free-form surfaces, these are first approximated by triangles. The accuracy of the approximation and hence the number of rendering primitives generated can clearly make a great difference in the rendering time. Refer to the section Free-Form Surfaces for more details on the available approximation techniques.

All objects in the input may include up to three flags. An object may or may not be visible to primary rays, may or may not be visible to secondary (reflection or refraction) rays and may or may not cast a shadow. This can be used to great advantage in, for example, the following way. Suppose one has an object which is generally spherical in shape but contains a great deal of surface detail. One can make this object visible to primary and secondary rays, but set the shadow casting flag to off. A second, much simpler object can be placed in the same position with flags set so that this object casts a shadow but is otherwise invisible. The end result is an image which appears correct, but for which the shadow computations are much less expensive. A similar technique can be employed so that reflected and refracted rays see a much simpler object.

Image Processing and Output Shaders

A number of operations performed in the course of rendering an image can be considered image processing operations. These include color clipping, dithering, gamma correction, anti-aliasing, filtering, and user-defined procedural operations on the output of the rendering process. (see output shader)

Internally, all color computations are done in single precision floating point, up to the point where a color is stored as a final pixel value. Pixels are stored as either eight or sixteen bits for red, green, blue and matte channel, respectively. Ray tracing defaults to eight bits, but will switch to sixteen bits automatically if at least one 16-bit output format or output shader is specified in the view. Since values in the range 0.0 to 1.0 are converted to values in the range 0 to 255 or 0 to 65535, something must be done with values greater than 1.0. This is known as color clipping. In the default method, values greater than 1.0 are simply truncated. Optionally, by giving either the -desaturate on option on the command line or desaturate on in the view, the color may be desaturated towards the nearest intensity of white.

When the floating point value is converted to an integer value, it is randomly rounded either up or down with a probability equal to the distance to the rounded value. This introduces an unavoidable amount of noise into the image, but reduces banding that can result from simply chopping the floating point value to the next lowest eight bit representation. This dithering is by default on, but may be turned off by either the -dither option on the command line or the dither statement in the view statement in the input file. (see view)

Another problem arising from quantization is that it is not necessarily best to use the 256 or 65536 values in linear increments of intensity. A so-called gamma correction is often applied which, if the gamma value is greater than 1.0, causes lower intensities to be resolved more finely than higher intensities. A gamma value of 1.0 is the default. A gamma value of 2.2 is recommended for video work. In general, images should be rendered with a gamma equal to that of the final display device. If, however, the image will be displayed in a variety of ways, such as both video and film, leaving the gamma at 1.0 is probably the simplest solution.

Aliasing is a general problem when any signal is sampled at less than twice the highest frequency in the signal. For images, this means that one ray per pixel results in jagged polygon edges and other artifacts. The solution is to increase the sampling rate, but this can obviously be quite expensive if it is done everywhere. Adaptive supersampling samples at a higher rate where a sufficiently large contrast between adjacent pixels is detected. Note that although terms from ray tracing were used here, the same ideas apply to scanline rendering.

The -min_samples and -max_samples command line options, or the min samples and max samples statements in the view statement (see view) can be used to control the recursive supersampling algorithm, which subdivides pixels and samples the corners of these subpixels or superpixels whenever a contrast that exceeds the threshold is detected. mental ray also traces contrast thresholds back into pixels already sampled. This allows choosing an extremely low initial sampling rate of significantly less than one sample per pixel (called infrasampling (see infrasampling) ), because if a feature such as a thin line is detected in any pixel, the entire line will be reconstructed and resolved correctly. Without infrasampling, gaps would appear in the line.

The -contrast command line option and the contrast statement in the view statement (see view) of the input file set the maximum contrast threshold between pixels, below which no recursive supersampling will be done.

The -filter command line option and the filter statement in the view statement (see view) of the input file specify how pixel subsamples are filtered. By default, a box filter is used that uses the average of all neighboring subsamples.

The -jitter command line option and the jitter statement in the view statement (see view) of the input file introduce systematic variations in the sampling locations to further reduce aliasing.

When rendering is finished, the generated image can be processed before being written out to image files. In the view statement, a list of output shaders (see output shader) may be specified that are executed in the order they appear. Output shader statements may be interspersed with file output statements, so both raw and postprocessed images can be written to files. Output shaders are user-supplied C functions that have read and write access to all frame buffers generated during rendering. It is possible to specify the type of data required for an output shader, such as 8-bit or 16-bit color information or depth information. mental ray will then compute this information during the rendering stage, and provide it to the output shader.

Understanding Diagnostic Messages

This description explains how to read the diagnostic messages printed, and in particular how to adjust render parameters accordingly to achieve optimal rendering speed and quality. The diagnostic messages must be turned on explicitly with the -verbose on command-line option, or with a verbose on command at the beginning of the .mi file.

First, information that is common to all frames of an animation are given. For each frame, the verbose output has three sections that correspond to the tasks that are performed in sequence:

  • scene translation and tesselation scene translation and tesselation

  • building the acceleration data structures for rendering building the acceleration data structures for rendering

  • rendering rendering

After the last frame, a resource usage information is given.

As an example, the verbose output of a two frame animation has the following structure:

     information common to all frames 
     frame 1 
     scene translation and tesselation 
     building the acceleration data structures for rendering 
     rendering 
     end frame 1 
     frame 2 
     scene translation and tesselation 
     building the acceleration data structures for rendering 
     rendering 
     end frame 2 
     resource usage information 
 

The three tasks within a frame are separated by a timing information of the form:

     current time: 0 real, 0 user, 0 sys seconds, current memory: 229204 bytes

Information Common to All Frames

The information common to all frames begins with the names of the included files. In particular, the inclusion of the softimage.mi file and of declarations for dynamically linked shaders is echoed. If user-defined shaders are compiled at runtime, the compiler output also appears here.

Per-Frame Information

The information per frame begins with information about the scene and about tesselation results. This part of the verbose output echos the description of the scene. If the input comes from a .mi file, there is a close correspondency of the verbose output and the input file. First, a list of textures, lights and materials is given. Then, the output files to be created, the camera and all mental ray options for that particular frame appear. This information is taken from the view parameters between the view and end view statements. View parameters not explicitly specified are also given here with their defaults. A list of textures, lights and materials may follow. Then a list of all included scene objects is printed, and information about the progress of the tesselation of surfaces is given.

The messages printed when building the acceleration data structures for rendering depend on the acceleration method (binary space partitioning (BSP) or ray classification):

Binary Space Partitioning (BSP)

This acceleration algorithm operates by building a hierarchical spatial data structure by recursively subdividing a bounding volume surrounding the entire scene. The resulting binary tree consists of branch nodes that correspond to a subdivision of a bounding volume (voxel) into two subvoxels and leaf nodes that contain the geometric primitives (triangles). The form of the tree is characterized by two parameters:

max size
in the view statement in the mi scene file controls the maximum number of primitives in a leaf. The default is 4.

max depth
in the view statement in the mi scene file controls the maximum number of levels of the tree, thus limiting how finely a branch of the tree can be subdivided. The default is 24.

There is a trade-off between speed and memory consumption. Memory usage of the BSP algorithm implicitly depends on these two parameters. In contrast to ray classification, there is no a priori estimate for the memory usage available. Large leaf sizes and small tree depths reduce memory usage but increase rendering time because larger leafs need to be searched. Increasing the tree depth also leads to a slight increase of the preprocessing time required for building the acceleration data structure.

The output of the BSP method for a typical scene looks as follows:

     number of triangles: 107761
     scene extent: [-25.0646, 21.4746] [-43.2608, 24.0328] [-91.2292, -50.6684]
     space subdivision............
     number of bin nodes:          141022
     number of leaf objects:       842154
     maximally reached tree depth: 24
     size of largest leaf node:    198

     current time: 50 real, 40 user, 1 sys seconds, current memory: 16763595 bytes

Here, the scene extent denotes the minimal and maximal x, y and z extent of the geometry in camera space. In particular, only triangles in the negative z range are visible for first generation rays, since the camera looks down the negative z axis. The number of bin nodes corresponds to the number of subdivisions of voxels (bounding volumes) into two subvoxels. The number of leaf objects gives the number of triangles that are stored in the leaf nodes of the BSP tree. Typically, the number of leaf objects is higher than the number of triangles in the scene (in this example by factor of about eight) since triangles are contained in more than one leaf node. The maximum tree depth reached is also given. In most cases, it is equal to the max size parameter in the view statement in the mi scene file, but for small scenes the BSP tree may not reach this depth.

A very important information is the size of the largest leaf node in the BSP tree. If this number is much higher than 10, optimizing the rendering speed by choosing a deeper BSP tree (e.g. max size 30 or higher) is highly recommended and might lead to considerable improvements in performance. When there are more triangles than the max size parameter in the BSP tree of the specified depth and leaf size, mental ray will ignore the maximum leaf size criterion. Therefore large leaves can result even if the max size parameter is set to a small number.

Ray Classification

With this algorithm, primitives (triangles) are first gathered into bounding boxes which are used in all further acceleration techniques. mental ray uses heuristics to balance the number of boxes and the number of triangles in each box; this is done by predicting and minimizing the cost of evaluating a box during rendering. Boxes are constructed by recursive partitioning of the input groups.

The space of rays is subdivided. The optimal subdivision is estimated by mental ray, but may be adjusted with the subdivision statement in the view. Two correction constants can be specified, for the space of all rays, and for eye and shadow ray classification.
(These two parameters are called visible and shadow in the SOFTIMAGE ray classification setup box.) Positive numbers subdivide more finely, while negative numbers generate fewer subdivisions. With finer subdivisions, the number of boxes that must be examined when a ray is cast is reduced, but the overhead for generating the acceleration data structures and intersection testing increases. Normally, small adjustments in the range (-2, 2) suffice for optimal speed.

The amount of memory used for ray classification is constant and defaults to approximately 6 megabytes per CPU. Even very large scenes with over a million triangles work at maximum speed with no more than 12 megabytes per CPU for acceleration data structures.

The output of ray classification for a typical scene looks as follows:

     computing object space subdivision

     scene extent: [-25.0555, 21.4746] [-43.2517, 24.0328] [-91.2201, -50.6684]
     number of triangles        107761
     number of implicit patches     0
     number of boxes             4379
     scene cost                    48.44

     current time: 31 real, 23 user, 1 sys seconds, current memory: 14059719 bytes

The scene extent is explained above. The number of boxes gives the number of bounding volumes which contain the triangles to be rendered. The scene cost is the result of the heuristic that balances the number of boxes and the number of triangles in each box. All other information is of no relevance for the user.

Rendering Information

The output during the rendering phase looks as follows:


     host/thread  area  percent

               0     864   0.41%
               0     837   0.80%
               (...)
               0     837  99.59%
               0     864 100.00%

    100.00% done locally

In this example, the rendering was performed on one host. If rendering in parallel is used, the contributions from different hosts/threads are displayed and summarized after the rendering has finished. As a rule of thumb: If a remote host contributed much less than the other hosts to the final picture it is better to remove it from the list of rendering hosts. The work done by this host is negligible compared to the increased network traffic it causes. Note however that this rule does not apply to the local host.

The subsequent rendering statistics is important for performance tuning. Again, the output differs for the BSP method and for ray classification.

For the BSP algorithm, the rendering statistics for a typical scene looks as follows:

     time in seconds:                                 168
     number of rays:                               918976
     milliseconds per ray:                          0.184
     rays per pixel:                                 4.33
     nodes per ray:                                 35.02
     leaves per ray:                                 6.89
     candidate triangles per ray:                   21.10
     tested triangles per ray:                      21.10
     box changes per ray:                            2.12

Time in seconds gives the user CPU time for the rendering. The number of rays shot is also given, as well as the milliseconds per ray. Nodes per ray and leaves per ray are the average number of branch and leaf nodes that are traversed in the BSP tree during rendering. The figure about the rays per pixel is very important for the quality of the picture and for the rendering performance. A detailed discussion is given in the section ``Rendering Quality'' below. Another key figure for performance tuning of mental ray is the number of candidate triangles per ray. If it is much larger than 10, using a deeper BSP tree by choosing max depth 30 or higher might result in dramatic improvements of rendering speed.

The rendering statistics for ray classification looks as follows:

     time in seconds:                                1013
     number of rays:                               921648
     milliseconds per ray:                          1.100
     rays per pixel:                                 4.35
     candidate triangles per ray:                   37.67
     tested triangles per ray:                      50.86

Again, the information that is most important for performance tuning of mental ray is the number of candidate triangles per ray. If it is much larger than 10, variations of the subdivision parameters in the range (-2, 2) around the default setting (subdivision 0 0) are recommended to optimize the rendering speed.
(These two parameters are called visible and shadow in the SOFTIMAGE ray classification setup box.)

Rendering Quality

The influence of the oversampling parameters on the rendering quality and speed is crucial. The recursive oversampling algorithm can be controlled by the min samples and max samples parameters. Recommended parameters are (-2, 0) for previews, (-1, 1) for medium quality
(comparable to SOFTIMAGE adaptive 2) and (0, 2) for high quality rendering. Note that the difference min samples - max samples should not be larger than three to ensure efficient rendering. Typical results for the number of rays per pixel are one ray per pixel for previews, four rays per pixel for medium quality and ten rays per pixel for high quality renderings. Of course these figures are mainly quoted as a guideline and may differ strongly from scene to scene. However, if the obtained value for rays per pixel is considerably above the typical number given here, one should carefully review the settings of mental ray. To force a faster preview, using contrast parameters above 0.1 (e.g. 0.2) is also helpful. The resulting images will generally have a lower quality, but there may be considerable speed improvements.

Resource Usage Information

A typical resource usage information that is given after the last frame of an animation is:

     resource usage

     user time:                         211 s
     system time:                         2 s
     maximum resident set:            24556 kb
     (...)

The most important information for the user is the user CPU time and the maximum resident set of the process. Specifically, if a large scene is rendered and the maximum resident set of the process gets comparable to the available memory on the machine used, performance can be degraded by swapping. To investigate this, one can use the UNIX top command while rendering is in progress. If the process size ( size) and the maximum resident set (rss) differ by more than a factor of two and the %cpu drops far below 50%, the machine is swapping. This will degrade performance dramatically. (Type man swap at a shell prompt for more information.) If this problem occurs, it is recommended to use ray classification as acceleration technique. Since with this method the amount of memory used for acceleration structures is constant, this will avoid swapping, unless the scene itself does not fit into memory.



Table of Contents