Differences

This shows you the differences between two versions of the page.

--- appendix:guidebook_authoring:benchmarks [2018/08/15 16:15]
kericson [5 Is the Number] - fixed a typo
+++ appendix:guidebook_authoring:benchmarks [2018/08/15 17:01] (current)
jguerin Added information about presentation of results.
@@ Line 9: / Line 9: @@
     * 5 tests can be easily run by hand, and averaged by hand.
       * Automation scripts will create many additional files to be maintained to ensure completeness.
-    * For many examples, >5 test will not change results significantly.((In our own tests this ))
+    * For many examples, > 5 test will not change results significantly.((In our own tests this appears to hold true.))((If your own tests for a particular example vary wildly, it is possible that your source/tests should be adjusted to compensate and hone in on what you are trying to test.))
   * Only display the average.
@@ Line 15: / Line 15: @@
 ==== Present Results in Seconds ====
+  * Consistency means little chance of misreading.
+  * Stick to the convention of displaying 3 places after the decimal.
+  * Avoid results that would be //time limit exceeded// wherever possible.
+    * 1s is a good cutoff for many purposes, 2s for some others.
+    * For a single column a ''-'' will suffice.
+    * Entire rows can be omitted when warranted.
+    * Ignore this rule if it will mislead the reader.
+      * E.g., If the last row is 10<sup>5</sup> at .030s, one would likely expect 10<sup>6</sup> to be .300 if the trend otherwise appears linear. If 10<sup>6</sup> is //actually// 2 seconds (for reasons we cannot control), this is important information for the reader.
+    * Ignore this rule if insufficient data would otherwise be presented.
+      * A table with a single row is not typically useful. A table with a row at 0s, a row at //almost// 0s, and no other rows is also likely to be misleading.
 ==== Make a Change -> Rerun All Tests ====
+  * If you make //any// changes to a test file, please rerun //all// associated tests.
+    * In order to make this possible, put a new ''files:'' page up for any new benchmarks created.
+    * Ensure that all associated files are included on this ''files:'' page, and //are not// links to other areas.
+      * This duplication is a violates terseness, but is important to guarantee that tests only rely on one page.
+      * Otherwise (in the event where many benchmarks refer to a single file location) it is impossible to know what to update and keep all results consistent.
+  * If you do not believe that you have changed results //prove it// rather than assuming that a change would not happen.
+  * In order for this guide to be usable, benchmarks must be implicitly worthy of trust.
+==== Make All Efforts to Only Test Your Target ====
+  * Keep all test cases as simple as possible.
+  * Consider your tests carefully, and make efforts to //only// test the desired language feature.
+    * For our [[python3:input_tests|standard input benchmarks]] we timed our entire test using the bash ''[[competitive_programming:linux|time]]'' builtin.
+      * In all cases we stored the results of a read, but we did not maintain them between iterations.
+      * E.g., We did not want to consider the cost of ''list'' growth outside of ''stdin.readlines()'' where storing an entire file is unavoidable.
+    * For our [[python3:output_tests|standard output benchmarks]] we decided to read the same data as our [[python3:input_tests|standard input benchmarks]] to maintain consistency, but did not want to account for the cost of reads.
+      * Instead of the ''[[competitive_programming:linux|time]]'' builtin we opted for the standard Python3 [[https://docs.python.org/3/library/timeit.html|timeit]] library to isolate only writes.
+  * Some level of judgment on the part of the author at the time is necessary to ensure that these efforts are made correctly.

UT Martin Competitive Programming Guidebook

User Tools

Site Tools

Differences

Page Tools