The SMP2 a3 core gets a speed boost on Linux

By , the in Folding@Home Project - 3 Comments
The SMP2 a3 core gets a speed boost on Linux
The SMP2 core is already well-liked amongst most members of the folding community for its ease of use compared to the older a1 and a2 cores. However, it has not been showing its full potential. A new revision, v2.22, has been released for the Linux SMP2 client, which results in speed boosts of around 15%, and contains science fixes and improvements. In short, it works faster and better than its predecessors.

For the curious, we have a technical explanation of where the performance gains have been found. Historically, in the a1 and a2 cores, the unit's processing was separated into two different types of node, typically one thread or process per node (seen in the log file as "Making 1D domain decomposition 2 x 1 x 1"). PP (particle to particle) nodes undertake the calculations governing the interactions between the different atoms being modeled, whilst PME (Particle Mesh Ewald) nodes calculate the electrostatic field the atoms are suspended in, which is affected by every atom in the simulation.

The difficulty arises in deducing the correct division between PP and PME nodes. PP nodes make up the vast majority of the calculations, but the precise number of nodes required per unit depends on the number of atoms in the simulation, and of course the number of cores available for processing. An incorrect or suboptimal balance between PP and PME nodes will result in lower performance as some nodes (cores of your CPU) will not be sufficiently loaded to make full use of the available hardware.

In version 2.22 of the a3 core, this conundrum has been solved by making all nodes capable of performing either PP or PME calculations, automatically adapting as the unit is processed to give the correct number of PP and PME nodes at any given moment in the simulation. Normally this would only offer a small performance increase, but on non-dedicated machines this newfound flexibility should result in a large performance gain as there is much less danger of any one core/node getting bogged down by having to share CPU time with another task: the composition of the nodes will be dynamically updated to allow another core to process that part of the simulation.

To install this new version, delete the current core and the client will download the new version automatically. As always, it is not recommended to perform this update mid-unit, as it is probable that the new core will reject the checkpoint created by the previous core and restart the unit. If you are absolutely certain that your machine will not be switched off before the unit ends, it may be possible to delete the old core without stopping the client. The old core will be held in RAM processing the unit, and the client will then update to the new core version when it starts the next unit.

We welcome reports of actual figures for any performance gains.

Source: Folding Forum