Last articles - FAH-Addict : Project FAQs
http://en.fah-addict.net
Last articles - FAH-Addict : Project FAQs(C) 2005-2020 PHPBoostenPHPBoostWhy fast WU return >>> volume of WU returns
http://en.fah-addict.net/articles/4-project-faqs/5-why-fast-wu-return-volume-of-wu-returns/
http://en.fah-addict.net/articles/4-project-faqs/5-why-fast-wu-return-volume-of-wu-returns/It has been a long-standing debate in the folding community as to which is best: fewer units folded at greater speed or more units folded but taking more time. The argument for the latter case is normally based on PPD as the "best" indicator of progress but <strong>7im</strong> on the Folding Forum has developed a couple of analogies to explain why this is not the case, which are copied here verbatim. The analogy is based on several project series totalling 1 million units.<br />
<br />
<span style="font-size: 10px;">These 1 million work units are not all in one gigantic gumball machine, where the machine spits out a random gumball each time we ask for one. There are about 20-30 gumball machines, of which 2-5 machines may be giving out gumballs to your type of client. And in each of those gumball machines for your client type, we have 1 or many colors of gumballs, which I'll call WU Project numbers. Also, each gumball is numbered, and can only be sent out in a certain sequential order.<br />
<br />
So, of the 20 so gumball machines, your SMP client may only be able to get gumballs from a few of that total. And in each of those SMP gumball machines, Pande Group may be sending out a mix of colors (projects) from a mix of those few SMP machines. However, at other times, they might only be sending out 1 color gumball from one specific machine (work server).<br />
<br />
The colors (projects) have different priorities, and those priorities change over time. We may get a mix of colors today, but next week we may only get RED. If one project needs more WUs completed, so the data can be compiled and analyzed prior to a grant renewal hearing, we might get GREEN for a whole month, and then we go back to a mix of colors for a while. Then Pande Group might notice a pattern developing in the BLUEs. A pattern, that once completed, might leapfrog the science forward. And in that case, all the remaining GREENs get discarded because they will never fit the new pattern.<br />
<br />
Also note the gumball machines are never full to the top with 1 million balls just hanging around. A few thousand at most, because WU #1235 isn't even created until #1234 is returned. And the results from #1234 can affect how #1235 is created. Minor course corrections or changes can be made along the way.<br />
<br />
Processing WUs is never a straight line, it's more of a matrix (see also Markov matrix). And because we have to go in a specific order, when you hold BLUE WU # 1234 for an extra 0.8 days, then BLUE WU #1235 is delayed by almost a day, and so on. If you keep repeating the process, each work unit adds an extra day of delay. After 30 WUs, the end has been delayed by as much as a month. This is because you can't guarantee the "2nd" WU that you process in that same 1.8 days is also a BLUE work unit. It is more than likely a GREEN because all the BLUES are tied up an extra day.<br />
<br />
For a better understanding of the complexity of the problem for queuing up work units that have to process sequentially, read about <a href="http://en.wikipedia.org/wiki/Queuing_theory">Queueing theory</a> or about <a href="http://en.wikipedia.org/wiki/Bin_packing_problem">the bin packing problem</a>. Also read about "opportunity cost" which is a problem of holding a WU for an extra day.<br />
<br />
Now some may say the example is over simplified, and I would have to agree. And they might bring up the fact that there are actually thousands of available GREEN or RED WUs at any one time, also true. There is a large amount of parallelism, with the many Runs and Clones in each Generation, as one would expect from a project this size. But we also need to remember how the statistical analysis of the data is done. While the data points are modeled in 3 dimensions, I like to think of it in 2D. Like the scatter pattern in a shotgun shot at a paper target. We can see how the pellets tend to concentrate in one area of the target, or another. Concentrated data points indicate the correctly folded protein configuration, with regard to temps, solution, etc. All variables in the Runs and Clones.<br />
<br />
Now some will claim that more WUs are better, even if turned in slower. Like a wide river moving slowly towards the ocean. Kind of like that shotgun blast, only in slow motion. All the data arrives at one time, at a much later time, but the answer still gets there. We see the concentration of pellets, or data points.<br />
<br />
However, if we fold fewer work units much faster, we get data points (pellets) hitting the target much sooner. And it is often possible to determine where the data points are concentrating before all of the work units arrive (before they all hit the paper target). We can see results sooner. And we can adjust the configuration of future work units to more accurately zero in on that concentration. Or we add certain Runs and Clones mid project to better define the edges of the concentration. Or we can eliminate GREENs that seem to be missing the target completely, end that project # earlier, and send out helpful BLUEs instead. We didn't need to wait for the whole slow fleet of boats to arrive. We scuttled the less helpful boat halfway to the ocean. <img src="http://en.fah-addict.net/images/smileys/wink.png" alt=";)" class="smiley" /><br />
<br />
We don't have to fold the total sum of the WUs to get results. But to map the results, we need many multiple generations of work units building on each other, i.e. many pellets hitting the target. And we can get answers without waiting for 1 million gumballs to be dispensed.</span>Mon, 24 Aug 2009 12:20:01 +0200The BigAdv units
http://en.fah-addict.net/articles/4-project-faqs/4-the-bigadv-units/
http://en.fah-addict.net/articles/4-project-faqs/4-the-bigadv-units/A new category of work units was recently released, dubbed the "BigAdv" (short for Big Advanced Methods) units. These units for the SMP A2 core are tested under public beta, just as their smaller brethren, but due to their massive size and short deadlines (25,403 points with a 4-day deadline at the time of writing), they cannot be run on quad core machines. It is recommended to use an 8-core (or more) machine with at least 0.5GB (preferably 1GB) of RAM per CPU core to fold these units.<br />
<br />
To reward these large hardware demands (and to discourage the use of multiple clients on a machine) a new points system is being trialled with the BigAdv units. This new system is designed to reward the fast return of units by awarding a bonus which increases in size the quicker the unit is returned. The formula for the points is as follows:<br />
<br />
Points awarded = square root (K * Deadline Time / Time taken to process) * WU point value<br />
where K is an arbitrary multiplier, initially set to 2. <br />
<br />
To be awarded the bonus at least 10 units using the A2 core must have been returned with a passkey, and at least 80% of these must have been returned within the deadline. This count continues whilst running bigadv units, and both normal and big A2 units count towards the 80% statistic. If less than 10 passkeyed A2 units have been returned, or the reliability factor is less than 80%, only the standard point value of the BigAdv unit will be awarded.<br />
<br />
To fold BigAdv units (using an 8-core machine as an example) the 6.29 (or later) Linux or Mac OS X client is required, and should be started like so:<br />
<br />
./fah6 -smp 8 -bigadv -verbosity 9<br />
<br />
Adapted from <a href="http://foldingforum.org/viewtopic.php?f=24&t=10697">the BigAdv release notes</a>.Wed, 19 Aug 2009 00:00:01 +0200