User Tools

Site Tools


appendix:guidebook_authoring:benchmarks

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
appendix:guidebook_authoring:benchmarks [2018/08/15 16:15]
kericson [5 Is the Number] - fixed a typo
appendix:guidebook_authoring:benchmarks [2018/08/15 17:01] (current)
jguerin Added information about presentation of results.
Line 9: Line 9:
     * 5 tests can be easily run by hand, and averaged by hand.     * 5 tests can be easily run by hand, and averaged by hand.
       * Automation scripts will create many additional files to be maintained to ensure completeness.       * Automation scripts will create many additional files to be maintained to ensure completeness.
-    * For many examples, >5 test will not change results significantly.((In our own tests this ))+    * For many examples, > 5 test will not change results significantly.((In our own tests this appears to hold true.))((If your own tests for a particular example vary wildly, it is possible that your source/tests should be adjusted to compensate and hone in on what you are trying to test.))
   * Only display the average.   * Only display the average.
  
Line 15: Line 15:
  
 ==== Present Results in Seconds ==== ==== Present Results in Seconds ====
 +  * Consistency means little chance of misreading.
 +  * Stick to the convention of displaying 3 places after the decimal.
 +  * Avoid results that would be //time limit exceeded// wherever possible.
 +    * 1s is a good cutoff for many purposes, 2s for some others.
 +    * For a single column a ''-'' will suffice.
 +    * Entire rows can be omitted when warranted.
 +    * Ignore this rule if it will mislead the reader.
 +      * E.g., If the last row is 10<sup>5</sup> at .030s, one would likely expect 10<sup>6</sup> to be .300 if the trend otherwise appears linear. If 10<sup>6</sup> is //actually// 2 seconds (for reasons we cannot control), this is important information for the reader.
 +    * Ignore this rule if insufficient data would otherwise be presented.
 +      * A table with a single row is not typically useful. A table with a row at 0s, a row at //almost// 0s, and no other rows is also likely to be misleading.
  
 ==== Make a Change -> Rerun All Tests ==== ==== Make a Change -> Rerun All Tests ====
 +  * If you make //any// changes to a test file, please rerun //all// associated tests.
 +    * In order to make this possible, put a new ''files:'' page up for any new benchmarks created.
 +    * Ensure that all associated files are included on this ''files:'' page, and //are not// links to other areas.
 +      * This duplication is a violates terseness, but is important to guarantee that tests only rely on one page.
 +      * Otherwise (in the event where many benchmarks refer to a single file location) it is impossible to know what to update and keep all results consistent.
 +  * If you do not believe that you have changed results //prove it// rather than assuming that a change would not happen.
 +  * In order for this guide to be usable, benchmarks must be implicitly worthy of trust.
  
 +==== Make All Efforts to Only Test Your Target ====
 +  * Keep all test cases as simple as possible.
 +  * Consider your tests carefully, and make efforts to //only// test the desired language feature.
 +    * For our [[python3:input_tests|standard input benchmarks]] we timed our entire test using the bash ''[[competitive_programming:linux|time]]'' builtin.
 +      * In all cases we stored the results of a read, but we did not maintain them between iterations.
 +      * E.g., We did not want to consider the cost of ''list'' growth outside of ''stdin.readlines()'' where storing an entire file is unavoidable. 
 +    * For our [[python3:output_tests|standard output benchmarks]] we decided to read the same data as our [[python3:input_tests|standard input benchmarks]] to maintain consistency, but did not want to account for the cost of reads.
 +      * Instead of the ''[[competitive_programming:linux|time]]'' builtin we opted for the standard Python3 [[https://docs.python.org/3/library/timeit.html|timeit]] library to isolate only writes.
 +  * Some level of judgment on the part of the author at the time is necessary to ensure that these efforts are made correctly.
  
  
appendix/guidebook_authoring/benchmarks.1534367713.txt.gz ยท Last modified: 2018/08/15 16:15 by kericson