To adapt and optimise the operation of GPU clients with the variety of equipment available, developers have included with environment variables for custom settings. These settings can influence the performance of core, stability or both. This article explains how to set these environment variables, and explains the parameters for ATI or nVidia.
Be aware that some variables are likely to disappear, or others may be added during the progress of development and optimization of the core. This article will be updated accordingly.
Configuring the variables
This part was written using Windows XP as a base, but the philosophy on Vista and Seven will remain the same, only the titles of the menus may vary slightly from one version to another.
Several methods exist to open the Control Panel item that interests us:
- Start> Control Panel> System (if you're viewing by category, switch to Classic view).
- Right click on My Computer> Properties
- The most simple and universal, the key combination WIN + Break on the keyboard (the WIN key is the key on which the Windows logo is drawn).
Once the System Properties window is open, go to the Advanced tab and click Environment Variables. Here is the window that appears:
In this window you can see both sides. The upper part called User Variables only applies to the current user. The variables set here apply immediately to the current user and a simple restart of the Folding @ Home client is enough to take them into account. If you use the client service mode and / or you have multiple users on the machine, this is not the right solution to set your variables. If you are the only user of your machine, it is the simplest method to configure your variables.
The second part of the window, is called System variables. The variables set here apply to all users of the system, but a reboot is necessary to take into account the changes.
Regardless of where you set your variables, keep in mind that names are case sensitive: VARIABLE_ENV is not the same as variable_env ... Do not get trapped.
To check that your variables are set, run a Command Prompt (Start> Programs> Accessories> Command Prompt) and use the command "echo %variable_name%" ... when the variable is properly configured, the value is displayed, otherwise variable_name%% is returned.
C: \> echo %FLUSH_INTERVAL%
Now that you know how to configure the variables, here is the list of possible configurations and their meanings.
These variables require Core 11 v1.24 or later and Catalyst 9.3 or later to work.
FLUSH_INTERVAL is what will affect graphics performance (the 2D lag phenomenon for example). This is the number of functions sent to the GPU in one go. The GPU will not do anything else, including refreshing the screen until processing of the batch of commands ends. A low value reduces the time F@H monopolises the GPU, and the response time of the interface increases. However, the lower the value becomes, the higher the CPU load related to the OS and the driver becomes, so there is a trade-off between the performance of F@H and the fluidity of the interface. If the batch is too large, it can cause a VPU Recover, the driver thinking that the GPU has hung (when it is just taking too long to respond).
CAL_NO_FLUSH and CAL_PRE_FLUSH change the method of submitting batches of functions to the hardware. CAL_NO_FLUSH changes how the batches of functions are built. CAL_PRE_FLUSH allows caching the batch, in order to prepare the next batch in advance while the GPU handles the current batch.
BROOK_YIELD has several modes: 0/1/2. 0 will monopolise the CPU to have the lowest latency to requests from the GPU. 1 will release the CPU while waiting for response from the GPU to process all commandsof the same or lower priority than the GPU core. 2 will release the CPU for every process, regardless of its priority. Now, for very small values of FLUSH_INTERVAL and small proteins, it is likely that the GPU is almost finished when the CPU is released. The GPU must wait to regain access to the CPU, which may take up to a millisecond. A high-end GPU will complete most such batches in less than 100 microseconds, therefore the period of waiting for access to the CPU can have a big impact on performance. With a high value of FLUSH_INTERVAL, it is easy to build several milliseconds of work, making the wait period less of an issue.
Here is a sample configuration:
FLUSH_INTERVAL = 128-256 for a 48x0, 64-96 for a 38x0 (the optimum setting so that the GPU remains at 100% regardless of the WU, without causing too much lag should be within these ranges).
BROOK_YIELD = 2 (to stop utilising 100% of the CPU and therefore allow an additional CPU client to be started)
CAL_PRE_FLUSH = 1
CAL_NO_FLUSH = 1 (but should be reverted to 0 if it causes too many VPU Recovers).
For nVidia cards
NV_FAH_CPU_AFFINITY is used to set the affinity of the core. It is not normally set up, unless you have trouble with program conflicts (performance degraded by sharing a CPU core with another program). Acceptable values are 0 to 15 (numerical values) ... this is actually a binary mask such that each bit represents a CPU: CPU3, CPU2, CPU1, CPU0. A value of 1 allows access to the CPU in question, and a value of 0 prohibits access. A numeric value 0 is the default: it allows access to all cores. If you use this variable in a custom configuration, it is recommended to enable the client option "do not lock cores to specific CPU" in the configuration.
Here are some examples for different values, as well as significant values:
1 -> 0001: CPU0 only
2 -> 0010: CPU1 only
3 -> 0011: CPU1 and CPU0 ... recommended for a dual core.
7 -> 0111: CPU2, CPU0 and CPU1 ... value adapted to a tri core.
9 -> 1001: CPU3 and CPU0
11 -> 1011: CPU3, CPU0 and CPU1
15 -> 1111: CPU3, CPU2, ... CPU0 and CPU1 value adapted to a quad core.
Most values are exotic combinations as you can see (1, 3, 9, 11, ...) but can be useful if you have a particular need. The most common values are 0 (default), 3 (dual core), 7 (tri core) and 15 (quad core).
FAH_GPU_IDLE is a variable that was introduced with the 1.26 core to help reduce the overheating of some cards with some WUs at the request of some members of the community. Its value corresponds to the rate of waiting time we wish to introduce into the core. For example, a value of 10 corresponds to 10% of waiting time. The decline in performance is not necessarily proportional to the waiting time: in fact, depending on the calculations, it may not be possible to maintain exactly the value requested. This value is a target, but in practice, the actual waiting time will vary around this value. If your card has problems with overheating, you can use this to find the value that suits you best to keep reasonable performance, while limiting overheating of your card.
Happy folding on the GPU.