R Resources

Using C in R

I looked for a long time to find a small self-contained example of code that requires a persistent structure across R invocations. This is primarily to allow me to deal with data sets that far exceed the memory on my computer, but that do allow me to read and process observations sequentially. (Examples are moments and regressions.)

Eventually, I pieced together a good example that implements AS75 (Applied Statistics' algorithm 75, WLS regressions, by WM Gentleman. It works, although it may have bugs.

If interested, look at the two files in AS75-in-C. To use them, make sure you have gcc installed and then run

  $ R CMD SHLIB Ras75.c  ## creates Ras75.so
  $ R --vanilla
   R version 3.0.1 (2013-05-16) -- "Good Sport"
   Copyright (C) 2013 The R Foundation for Statistical Computing
   ...
  > source("as75.R")

The code is reasonably short and reasonably documented, so it should not be difficult to figure out what it does.

However, if you want to use it for its regression aspect, you should realize that Thomas Lumley's biglm on CRAN does this, too, and probably a lot better.

June 2013

R Benchmarks

Common CPU benchmarks at sites such as anandtech or cnet are often frustrating for R users. R users care less about frame rates in FPS or single-precision floating point speed. They care more about double precision speed that is accessible to the standard R implementation. I was interested in what is fast and what is not fast for R purposes.

This site provides a (linux) perl script that runs Simon Urbanek's R-benchmark 25, records some basic information about the computer and installation on which the benchmark was run, and sends it off to my (public) website.

None of the transmitted information is sensitive. The information is publicly accessible by anyone over the web. (For now, I just use it to display the below table nicely.) I hope this will be a collaborative site that will contain many different timings. There should be almost no effort involved in contributing:

  1. Download R-benchmark-client.pl (rightclick and save).
  2. Open a terminal and run "perl R-benchmark-client.pl". It will prompt you before sending off the results.
You may want to be paranoid and examine my script before you run it. It is short and sweet. Count on the script to take about 1-5 minutes on 2012 intel hardware. Please be forgiving if there are bugs in my code—this is an early version of the script. If the script bombs, please email me with some basic information of where and how.

Current Results — Sortable

For a pure hardware table, i.e., without atlas results, click here. The number next to the atlas version is the speedup relative to the ordinary unoptimized blas library.

Release CPU  Cores  RAM   R   BLAS/LAPack   Total  Avg Notes Run
2013-06 Asus i7-4770k OC Haswell 4C,8T; Asus-OC 32GB 3.0.1 MKL (4.9) 5.1 0.29 Jiasun li: MKL from source 1372880285.164.67.176.207
2013-06 Asus i7-4770k OC Haswell 4C,8T; Asus-OC 32GB 3.0.1 Atlas3 (3.5) 7.2 0.45 Ivo Welch: ubuntu binaries 1372461810.164.67.176.151
2013-06 Asus i7-4770k Haswell 4C,8T; 3.5GHz 16GB 3.0.1 Atlas3 (3.2) 7.9 0.49 Ivo Welch: ubuntu binaries 1372438885.164.67.176.151
2013-06 Asus i7-4770k Haswell 4C,8T; 3.5GHz 16GB 3.0.1 (1.0) 25.3 0.87 Ivo Welch: ubuntu binaries 1372438688.164.67.176.151
2012-10 AMD FX-8350 Vishera 8C; 4.0GHz 16GB 3.0.1 ACML* 7.9 0.45 Jeroen: Natively Recompiled.
AMD Core Math Lib 5.3.1, libacml_mp*
1372511029.84.26.41.82
2012-10 AMD FX-8350 Vishera 8C; 4.0GHz 16GB 3.0.1 Openblas 0.2.6 9.2 0.49 Jeroen: Natively Recompiled
Openblas.
1372517595.84.26.41.82
2012-10 AMD FX-8350 Vishera 8C; 4.0GHz 16GB 3.0.1 Intel(!)* MKL 7.9 0.45 Jeroen: Natively Recompiled
Intel!.
/1375688094.82.217.138.7
2012-03 Xeon E3-1230 Sandy 4C,8T; 3.3GHz 32GB 3.0.1 MKL 7.0 0.40 Rob Richmond: Intel MKL 1372441649.164.67.176.205
2012-03 Xeon E3-1230 Sandy 4C,8T; 3.3GHz 32GB 3.0.1 Atlas 7.1 0.44 Rob Richmond 1372440866.164.67.176.205
2012-03 Xeon E5-2630 Sandy 12C,24T; 2.3GHz 32GB 3.0.1 OpenBlas 0.2.5 10.9 0.55 Stefan Evert 1372628509.131.188.185.51
2011-10 Core i7-2670QM 4C,8T; 2.20GHz 8GB 3.0.1 40.2 1.40 unknown 1372441927.35.16.87.146
2011-10 i5-2430M 2C, 4T; 2.4GHz (35W) 8GB 3.0.1 atlas3 (2.7) 50.4 3.1 Ivo Welch, Sony Vaio VPX-SE13q
2011-10 i5-2430M 2C, 4T; 2.4GHz (35W) 8GB 3.0.1 (1.0) 138 5.1 Ivo Welch, Sony Vaio VPX-SE13q
2011-06 Mac Air i5-2557M 2C,4T; 1.7GHz 4GB 3.0.1 OSX Atlas 48.7 1.78 Ivo Welch: OSX binaries NA, unexpectedly slow
2011-01 Shuttle i7-2660k Ivy 4C,8T; 3.4GHz 32GB 3.0.1 Atlas3 (3.2) 9.6 0.61 Ivo Welch: ubuntu binaries NA
2011-01 Shuttle i7-2660k Ivy 4C,8T; 3.4GHz 32GB 3.0.1 (1.0) 30.4 1.05 Ivo Welch: ubuntu binaries NA
2011-01 i5-2500k Sandy 4C,4T; 3.3GHz 16GB 3.0.1 31.8 1.12 Ivo Welch: ubuntu binaries NA: server
2010-09 i5 M460 4C; 2.53GHz 8GB 3.0.1 50.0 1.90 NA 1372879037.164.67.176.198 (1372881716.164.67.176.198 slower
2010-05 Atom N475 1C, 2T, 1.83Ghz 1GB 3.0.1 Atlas3 (2.0) 117 6.32 Millo Giovanni 1373308188.93.37.145.31
2010-05 Atom N475 1C, 2T, 1.83Ghz 1GB 3.0.1 (1.0) 238 8.87 Millo Giovanni 1373307179.93.37.145.31
2010-05 Celeron U3400 ubuntu 12 2C, 1.07GHz (18W) 2GB 2.14.1 (1.0) 111 4.0 Ivo Welch NA
2010-05 Celeron U3400 mint 15 2C, 1.07GHz (18W) 2GB 2.15.2 (1.0) 114 4.4 Ivo Welch NA
2010-05 Celeron U3400 mint 15 2C, 1.07GHz (18W) 2GB 2.15.2 atlas3 (3.0) 37.4 2.4 Ivo Welch NA
2010-05 Celeron U3400 mint 15 2C, 1.07GHz (18W) 2GB 3.0.1 (1.0) 113 4.4 Ivo Welch NA
2010-05 Celeron U3400 mint 15 2C, 1.07GHz (18W) 2GB 3.0.1 atlas3 (3.0) 38.0 2.4 Ivo Welch NA
2009-09 i7 720QM Clarksfield 4C, 8T; 1.6GHz 4GB 2.15.2 Atlas3 (3.7) 19.1 1.29 Millo Giovanni 1374440130.93.37.148.226
2009-09 i7 720QM Clarksfield 4C, 8T; 1.6GHz 4GB 2.15.2 (1.0) 71.0 2.79 Millo Giovanni 1374435766.93.37.148.226
2009-06 i7-950 Bloomfield 4C,8T; 3.07GHz 16GB 3.0.1 atlas-3gf (2.4) 15.4 0.94 Ivo Welch 1372696841.164.67.176.189
2009-06 i7-950 Bloomfield 4C,8T; 3.07GHz 16GB 3.0.1 (1.0) 36.2 1.36 Ivo Welch 1372696608.164.67.176.189
2008-06 Atom N270 2C, 1.60GHz (2.5W) 2GB 2.15.2 atlas3 (2.1) 133 7.2 Ivo Welch 1375836918.164.67.176.162
2008-06 Atom N270 2C, 1.60GHz (2.5W) 2GB 2.15.2 (1.0) 274 10.2 Ivo Welch 1375835934.164.67.176.162
2006-01 Core Duo T2300 2C, 2T, 1.66Ghz 0.5GB 2.14 Atlas 3gf (2.4) 48.6 3.01 Millo Giovanni 1373404720.93.37.135.152
2006-01 Core Duo T2300 2C, 2T, 1.66Ghz 0.5GB 2.14 (1.0) 119 4.74 Millo Giovanni 1373403460.93.37.135.152
2001-07 Pentium III-M 1C, 1.1Ghz 0.5GB 2.14.1 373 10.51 Millo Giovanni 1373307179.93.37.145.31
1999-02 Pentium III Coppermine 1C, 0.5Ghz 0.25GB 3.0.0 doesn't install 940 26.41 Millo Giovanni p-iii email
2011-10 Sitara AM3359, ARMv7r2
Beaglebone Black
1C, 1GHz 0.5GB 3.0.1 1663 41.92 Jeroen
2011-10 Sitara AM3359, ARMv7r2
Beaglebone Black
1C, 1GHz 0.5GB 3.0.1 atlas 3.10.1 1052 33.7 Jeroen

* Note: Jeroen mentioned that the AMD math library (ACML) implementation may have NaN-issues, mentioned in here: "R relies on ISO/IEC 60559 compliance of an external BLAS. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to computed be, whereas x*0 can be NaN." ACML does not propagate NaN according to R's expectations. (OpenBLAS is on BLAS-library and ACML is just another BLAS library.) The ACML has the NaN propagation issues, OpenBLAS does not. Note that MKL is Intel's equivalent implementation of Atlas for its own processors. Even more interesting, the Intel MKL is better than the ACML.

The "total" and "avg" columns are the total time and the average time on the benchmarks. The average is not arithmetic, but trimmed. For more detail, refer to Simon's benchmark...or just run the benchmark for yourself on your own computer. Of course, Simon's benchmarks are not representative of a lot of other tasks. They are good representations only of typical R calculations. For other R benchmarks, see Revolution R Benchmarks. Of course, Revolution R does not support linux, nor do they benchmark different processors or setup.

freqperf-chart Some obvious observations from the table: MKL libraries can produce a kick up, but for the most part, higher frequencies and more cores translates almost directly into performance. The year-to-year architecture changes have been very modest.

All submitted results go into the results directory. If you run the script for a novel setup, then please email me a note, too, to alert me. Make sure to tell me what BLAS/LAPack library you are using---it may be in the submitted info, but it is hard to tease out. If you do, I will try to include your results in this table. Also, if your results are very different from what this table claims they should be, please send me an email and let me know why you think this might be. In any emails, please speculate why—explain a little more about your setup. If you want your name not be mentioned with your results, please say so in your email. (I need your ip address and time of submission). If there is something unusual in your installation that I should note, again please let me know.

I am particularly interested in some of the more exotic CPU examples that I don't have here---things like the Intel Phi, ARMs, snow clusters (if it makes a difference), overclocked CPUs, older CPUs, Kaveri CPUs, etc.; or Intel MKL compiled versions; or different BLAS/LAPack libraries; or values that are just very different from those that are already in the table. (The point of this table is not to showcase changes in coding, but to see how plain R programs perform with no changes on different platforms. Please stick to the unmodified urbanek R-benchmarks.) If you can increase performance in this test by a significant factor through odd hardware or simple recompiles (e.g. through GPU recoding), please email me with some more explanations, too.

Does anyone have an Intel phi processor board to try this out?

Does anyone have an AMD Kaveri?


More Details

The client script collects the following pieces of information and sends them back to the server (for now, R.ivo-welch.info):

Again, none of this should be security-sensitive information. The information is stored in the results/ directory, where everyone can read it and process it. If I get a couple of hundred submissions, I will write a nicer display screen that collects the information.

The server collector with which the client communicates is a simple perl script, fbhole, which you can think of as a "file blackhole server"—you can send text to it, but it does not do anything else. It has simple configuration and logging, which minimizes the probability of a security breach. Its a nice simple unsophisticated and easy complement to any web server.