I looked for a long time to find a small self-contained example of code that requires a persistent structure across R invocations. This is primarily to allow me to deal with data sets that far exceed the memory on my computer, but that do allow me to read and process observations sequentially. (Examples are moments and regressions.)
Eventually, I pieced together a good example that implements AS75 (Applied Statistics' algorithm 75, WLS regressions, by WM Gentleman. It works, although it may have bugs.
If interested, look at the two files in AS75-in-C. To use them, make sure you have gcc installed and then run
$ R CMD SHLIB Ras75.c ## creates Ras75.so $ R --vanilla R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing ... > source("as75.R")
The code is reasonably short and reasonably documented, so it should not be difficult to figure out what it does.
However, if you want to use it for its regression aspect, you should realize that Thomas Lumley's biglm on CRAN does this, too, and probably a lot better.
Common CPU benchmarks at sites such as anandtech or cnet are often frustrating for R users. R users care less about frame rates in FPS or single-precision floating point speed. They care more about double precision speed that is accessible to the standard R implementation. I was interested in what is fast and what is not fast for R purposes.
This site provides a (linux) perl script that runs Simon Urbanek's R-benchmark 25, records some basic information about the computer and installation on which the benchmark was run, and sends it off to my (public) website.
None of the transmitted information is sensitive. The information is publicly accessible by anyone over the web. (For now, I just use it to display the below table nicely.) I hope this will be a collaborative site that will contain many different timings. There should be almost no effort involved in contributing:
For a pure hardware table, i.e., without atlas results, click here. The number next to the atlas version is the speedup relative to the ordinary unoptimized blas library.
|2013-06||Asus i7-4770k OC Haswell||4C,8T; Asus-OC||32GB||3.0.1||MKL (4.9)||5.1||0.29||Jiasun li: MKL from source||1372880218.104.22.168.207|
|2013-06||Asus i7-4770k OC Haswell||4C,8T; Asus-OC||32GB||3.0.1||Atlas3 (3.5)||7.2||0.45||Ivo Welch: ubuntu binaries||1372461810.164.67.176.151|
|2013-06||Asus i7-4770k Haswell||4C,8T; 3.5GHz||16GB||3.0.1||Atlas3 (3.2)||7.9||0.49||Ivo Welch: ubuntu binaries||1372438822.214.171.124.151|
|2013-06||Asus i7-4770k Haswell||4C,8T; 3.5GHz||16GB||3.0.1||(1.0)||25.3||0.87||Ivo Welch: ubuntu binaries||13724386126.96.36.199.151|
|2012-10||AMD FX-8350 Vishera||8C; 4.0GHz||16GB||3.0.1||ACML*||7.9||0.45||Jeroen: Natively Recompiled. |
AMD Core Math Lib 5.3.1, libacml_mp*
|2012-10||AMD FX-8350 Vishera||8C; 4.0GHz||16GB||3.0.1||Openblas 0.2.6||9.2||0.49|| Jeroen: Natively Recompiled |
|2012-10||AMD FX-8350 Vishera||8C; 4.0GHz||16GB||3.0.1||Intel(!)* MKL||7.9||0.45|| Jeroen: Natively Recompiled |
|2012-03||Xeon E3-1230 Sandy||4C,8T; 3.3GHz||32GB||3.0.1||MKL||7.0||0.40||Rob Richmond: Intel MKL||13724416188.8.131.52.205|
|2012-03||Xeon E3-1230 Sandy||4C,8T; 3.3GHz||32GB||3.0.1||Atlas||7.1||0.44||Rob Richmond||13724408184.108.40.206.205|
|2012-03||Xeon E5-2630 Sandy||12C,24T; 2.3GHz||32GB||3.0.1||OpenBlas 0.2.5||10.9||0.55||Stefan Evert||1372628509.131.188.185.51|
|2011-10||Core i7-2670QM||4C,8T; 2.20GHz||8GB||3.0.1||40.2||1.40||unknown||13724419220.127.116.11.146|
|2011-10||i5-2430M||2C, 4T; 2.4GHz (35W)||8GB||3.0.1||atlas3 (2.7)||50.4||3.1||Ivo Welch, Sony Vaio VPX-SE13q|
|2011-10||i5-2430M||2C, 4T; 2.4GHz (35W)||8GB||3.0.1||(1.0)||138||5.1||Ivo Welch, Sony Vaio VPX-SE13q|
|2011-06||Mac Air i5-2557M||2C,4T; 1.7GHz||4GB||3.0.1||OSX Atlas||48.7||1.78||Ivo Welch: OSX binaries||NA, unexpectedly slow|
|2011-01||Shuttle i7-2660k Ivy||4C,8T; 3.4GHz||32GB||3.0.1||Atlas3 (3.2)||9.6||0.61||Ivo Welch: ubuntu binaries||NA|
|2011-01||Shuttle i7-2660k Ivy||4C,8T; 3.4GHz||32GB||3.0.1||(1.0)||30.4||1.05||Ivo Welch: ubuntu binaries||NA|
|2011-01||i5-2500k Sandy||4C,4T; 3.3GHz||16GB||3.0.1||31.8||1.12||Ivo Welch: ubuntu binaries||NA: server|
|2010-09||i5 M460||4C; 2.53GHz||8GB||3.0.1||50.0||1.90||NA||1372879037.164.67.176.198 (1372881718.104.22.168.198 slower|
|2010-05||Atom N475||1C, 2T, 1.83Ghz||1GB||3.0.1||Atlas3 (2.0)||117||6.32||Millo Giovanni||137330822.214.171.124.31|
|2010-05||Atom N475||1C, 2T, 1.83Ghz||1GB||3.0.1||(1.0)||238||8.87||Millo Giovanni||1373307126.96.36.199.31|
|2010-05||Celeron U3400 ubuntu 12||2C, 1.07GHz (18W)||2GB||2.14.1||(1.0)||111||4.0||Ivo Welch||NA|
|2010-05||Celeron U3400 mint 15||2C, 1.07GHz (18W)||2GB||2.15.2||(1.0)||114||4.4||Ivo Welch||NA|
|2010-05||Celeron U3400 mint 15||2C, 1.07GHz (18W)||2GB||2.15.2||atlas3 (3.0)||37.4||2.4||Ivo Welch||NA|
|2010-05||Celeron U3400 mint 15||2C, 1.07GHz (18W)||2GB||3.0.1||(1.0)||113||4.4||Ivo Welch||NA|
|2010-05||Celeron U3400 mint 15||2C, 1.07GHz (18W)||2GB||3.0.1||atlas3 (3.0)||38.0||2.4||Ivo Welch||NA|
|2009-09||i7 720QM Clarksfield||4C, 8T; 1.6GHz||4GB||2.15.2||Atlas3 (3.7)||19.1||1.29||Millo Giovanni||1374440188.8.131.52.226|
|2009-09||i7 720QM Clarksfield||4C, 8T; 1.6GHz||4GB||2.15.2||(1.0)||71.0||2.79||Millo Giovanni||13744357184.108.40.206.226|
|2009-06||i7-950 Bloomfield||4C,8T; 3.07GHz||16GB||3.0.1||atlas-3gf (2.4)||15.4||0.94||Ivo Welch||13726968220.127.116.11.189|
|2009-06||i7-950 Bloomfield||4C,8T; 3.07GHz||16GB||3.0.1||(1.0)||36.2||1.36||Ivo Welch||1372696608.164.67.176.189|
|2008-06||Atom N270||2C, 1.60GHz (2.5W)||2GB||2.15.2||atlas3 (2.1)||133||7.2||Ivo Welch||1375836918.104.22.168.162|
|2008-06||Atom N270||2C, 1.60GHz (2.5W)||2GB||2.15.2||(1.0)||274||10.2||Ivo Welch||1375835922.214.171.124.162|
|2006-01||Core Duo T2300||2C, 2T, 1.66Ghz||0.5GB||2.14||Atlas 3gf (2.4)||48.6||3.01||Millo Giovanni||13734047126.96.36.199.152|
|2006-01||Core Duo T2300||2C, 2T, 1.66Ghz||0.5GB||2.14||(1.0)||119||4.74||Millo Giovanni||13734034188.8.131.52.152|
|2001-07||Pentium III-M||1C, 1.1Ghz||0.5GB||2.14.1||373||10.51||Millo Giovanni||1373307184.108.40.206.31|
|1999-02||Pentium III Coppermine||1C, 0.5Ghz||0.25GB||3.0.0||doesn't install||940||26.41||Millo Giovanni||p-iii email|
|2011-10|| Sitara AM3359, ARMv7r2 |
|2011-10|| Sitara AM3359, ARMv7r2 |
|1C, 1GHz||0.5GB||3.0.1||atlas 3.10.1||1052||33.7||Jeroen|
* Note: Jeroen mentioned that the AMD math library (ACML) implementation may have NaN-issues, mentioned in here: "R relies on ISO/IEC 60559 compliance of an external BLAS. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to computed be, whereas x*0 can be NaN." ACML does not propagate NaN according to R's expectations. (OpenBLAS is on BLAS-library and ACML is just another BLAS library.) The ACML has the NaN propagation issues, OpenBLAS does not. Note that MKL is Intel's equivalent implementation of Atlas for its own processors. Even more interesting, the Intel MKL is better than the ACML.
The "total" and "avg" columns are the total time and the average time on the benchmarks. The average is not arithmetic, but trimmed. For more detail, refer to Simon's benchmark...or just run the benchmark for yourself on your own computer. Of course, Simon's benchmarks are not representative of a lot of other tasks. They are good representations only of typical R calculations. For other R benchmarks, see Revolution R Benchmarks. Of course, Revolution R does not support linux, nor do they benchmark different processors or setup.Some obvious observations from the table:
All submitted results go into the results directory. If you run the script for a novel setup, then please email me a note, too, to alert me. Make sure to tell me what BLAS/LAPack library you are using---it may be in the submitted info, but it is hard to tease out. If you do, I will try to include your results in this table. Also, if your results are very different from what this table claims they should be, please send me an email and let me know why you think this might be. In any emails, please speculate why—explain a little more about your setup. If you want your name not be mentioned with your results, please say so in your email. (I need your ip address and time of submission). If there is something unusual in your installation that I should note, again please let me know.
I am particularly interested in some of the more exotic CPU examples that I don't have here---things like the Intel Phi, ARMs, snow clusters (if it makes a difference), overclocked CPUs, older CPUs, Kaveri CPUs, etc.; or Intel MKL compiled versions; or different BLAS/LAPack libraries; or values that are just very different from those that are already in the table. (The point of this table is not to showcase changes in coding, but to see how plain R programs perform with no changes on different platforms. Please stick to the unmodified urbanek R-benchmarks.) If you can increase performance in this test by a significant factor through odd hardware or simple recompiles (e.g. through GPU recoding), please email me with some more explanations, too.
Does anyone have an Intel phi processor board to try this out?
Does anyone have an AMD Kaveri?
The client script collects the following pieces of information and sends them back to the server (for now, R.ivo-welch.info):
Again, none of this should be security-sensitive information. The information is stored in the results/ directory, where everyone can read it and process it. If I get a couple of hundred submissions, I will write a nicer display screen that collects the information.
The server collector with which the client communicates is a simple perl script, fbhole, which you can think of as a "file blackhole server"—you can send text to it, but it does not do anything else. It has simple configuration and logging, which minimizes the probability of a security breach. Its a nice simple unsophisticated and easy complement to any web server.