Monday, 2 December 2013

Dell PowerEdge R820 performance

System Information
  Operating System      Linux 2.6.32-220.13.1.el6.x86_64 x86_64
  Model                 Dell Inc. PowerEdge R820
  Motherboard           Dell Inc. 0YWR73
  Processor                    Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz @ 2.20 GHz
                        4 Processors, 32 Cores, 64 Threads
  Processor ID          GenuineIntel Family 6 Model 45 Stepping 7
  L1 Instruction Cache  32.0 KB x 8
  L1 Data Cache         32.0 KB x 8
  L2 Cache              256 KB x 8
  L3 Cache              16.0 MB
  Memory                505 GB
  BIOS                  Dell Inc. 1.6.6
  Compiler              Clang 3.3 (tags/RELEASE_33/final)

Integer Performance
  AES
    single-core          1008
    multi-core          15940 |||||||
  Twofish
    single-core          2010
    multi-core          74612 ||||||||||||||||||||||||||||||||||||
  SHA1
    single-core          2126 |
    multi-core          65765 |||||||||||||||||||||||||||||||
  SHA2
    single-core          2308 |
    multi-core          50802 ||||||||||||||||||||||||
  BZip2 Compress
    single-core          2069
    multi-core          55728 ||||||||||||||||||||||||||
  BZip2 Decompress
    single-core          1951
    multi-core          45533 |||||||||||||||||||||
  JPEG Compress
    single-core          2261 |
    multi-core          71539 ||||||||||||||||||||||||||||||||||
  JPEG Decompress
    single-core          3260 |
    multi-core          63507 ||||||||||||||||||||||||||||||
  PNG Compress
    single-core          2192 |
    multi-core          77031 |||||||||||||||||||||||||||||||||||||
  PNG Decompress
    single-core          2293 |
    multi-core          41592 ||||||||||||||||||||
  Sobel
    single-core          3043 |
    multi-core          75156 ||||||||||||||||||||||||||||||||||||
  Lua
    single-core          2346 |
    multi-core          62456 ||||||||||||||||||||||||||||||
  Dijkstra
    single-core          2031
    multi-core          26930 ||||||||||||

Floating Point Performance
  BlackScholes
    single-core           246
    multi-core           9815 ||||
  Mandelbrot
    single-core          2029
    multi-core          82867 ||||||||||||||||||||||||||||||||||||||||
  Sharpen Filter
    single-core          2009
    multi-core          50860 ||||||||||||||||||||||||
  Blur Filter
    single-core          1635
    multi-core          47182 ||||||||||||||||||||||
  SGEMM
    single-core          3076 |
    multi-core          52120 |||||||||||||||||||||||||
  DGEMM
    single-core          3016 |
    multi-core          60899 |||||||||||||||||||||||||||||
  SFFT
    single-core          2031
    multi-core          45838 ||||||||||||||||||||||
  DFFT
    single-core          2043
    multi-core          63208 ||||||||||||||||||||||||||||||
  N-Body
    single-core          2349 |
    multi-core          69287 |||||||||||||||||||||||||||||||||
  Ray Trace
    single-core          2809 |
    multi-core          59411 ||||||||||||||||||||||||||||

Memory Performance
  Stream Copy
    single-core          1365
    multi-core           6013 ||
  Stream Scale
    single-core          2533 |
    multi-core           8406 ||||
  Stream Add
    single-core          2356 |
    multi-core           8290 ||||
  Stream Triad
    single-core          2431 |
    multi-core           5831 ||

Benchmark Summary
  Integer Score              2154  51636
  Floating Point Score       1827  48624
  Memory Score               2109   7030

  Geekbench Score            2014  41510



[hostname]$ ./mhz
1297 MHz, 0.7710 nanosec clock
[hostname]$ ./lat_mem_rd 512
"stride=64
0.00049 1.542
0.00098 1.543
0.00195 1.544
0.00293 1.543
0.00391 1.542
0.00586 1.544
0.00781 1.544
0.01172 1.543
0.01562 1.542
0.02344 1.545
0.03125 1.547
0.04688 4.627
0.06250 4.627
0.09375 4.671
0.12500 4.646
0.18750 4.717
0.25000 5.610
0.37500 6.594
0.50000 6.666
0.75000 6.692
1.00000 6.680
1.50000 6.668
2.00000 6.660
3.00000 6.645
4.00000 6.632
6.00000 6.628
8.00000 6.628
12.00000 6.915
16.00000 8.743
24.00000 11.071
32.00000 11.150
48.00000 11.126
64.00000 11.054
96.00000 11.090
128.00000 11.190
192.00000 11.151
256.00000 11.039
384.00000 10.953
512.00000 11.196


[hostname]$ ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74177 microseconds.
   (= 74177 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            4308.9     0.041765     0.037132     0.077464
Scale:           4667.0     0.038481     0.034283     0.070960
Add:             5509.4     0.046743     0.043562     0.071310
Triad:           6521.2     0.036973     0.036803     0.037278
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
[hostname]$

No comments:

Post a Comment