Why are my new servers so much slower than the old ones?

I had a support call today where I was asked to have a look at some servers to find out why they seemed so much slower than the existing ones. With not much detail to go on I first looked at some basic metrics;

Basic Metrics

MetricOld ServerNew Server
Hardware ModelHP ProLiant DL380 G5Dell PowerEdge R715
Operating SystemMicrosoft(R) Windows(R) Server 2003 Standard x64 EditionMicrosoft Windows Server 2008 R2 Enterprise
Memory32,766 MB131,046 MB
Processor2 Processor(s) Installed.
[01]: EM64T Family 6 Model 23 Stepping 10 GenuineIntel ~3000 Mhz
[02]: EM64T Family 6 Model 23 Stepping 10 GenuineIntel ~3000 Mhz
2 Processor(s) Installed.
[01]: AMD64 Family 21 Model 1 Stepping 2 AuthenticAMD ~3000 Mhz
[02]: AMD64 Family 21 Model 1 Stepping 2 AuthenticAMD ~3000 Mhz

The first thing that stands out is that the new server is from a different hardware vendor, but a higher spec, later generation system – so what could be wrong?

I decided to have a look in the BIOS first to see if there were any obvious misconfigurations, and noticed that the power management settings were not set to “Maximum Performance” and that the C1E state was enabled.

Before changing anything I downloaded and ran Super Pi to get a simple baseline of single threaded calculations on the new higher spec server.

I then changed three BIOS settings, and re-ran the Super Pi calculations;

  • Enabled “Processor HPC mode”
  • Disabled “C1E”
  • Set Power Management to “Maximum Performance”

The results;

SuperPi

** before on the left, after on the right

WOW, what a difference! By simply changing the power management settings in the BIOS, a calculation that previously took 1 minute 8 seconds now only takes 11 seconds!

18,676 total views, 4 views today

Share on LinkedInShare on FacebookTweet about this on TwitterShare on Google+Digg thisShare on RedditPin on PinterestEmail this to someonePrint this page

Comments

  1. Nice post Jon, kinda reminds me of all the problems with those R815’s and ESXi 5.0 a while back.

  2. Hi Jon

    Yep that’s it the NUMA rebalance. We bought 4 more recently and I couldn’t remember what the solution was so Google’d it and found my own posts and your helpful solutions ๐Ÿ™‚ So thank you again!

    Thanks for the nice feedback on the site btw. VMnews was something I setup last year partly for my own amusement and partly as I started looking for a new job/contract. My employers were looking to shed people – but turned out they needed me ๐Ÿ™‚ So it’s sat dormant since then. I’ll get round to finishing it off eventually ๐Ÿ™‚

    Your site is pretty neat, love the theme and like the remote support functionality. Do you just work for the banking sector or do you undertake private work too?

    • Hey Pete,

      I think the concept and layout is awesome, you can scan through a heap of content very quickly and pick out the relevant articles on demand – I think there is definitely some long term value there (perhaps the next Summly?).

      Thanks for the site feedback, perhaps one day it will make it onto the vmnews feeds ๐Ÿ™‚

      Yes, I do take on private work and will often collaborate with others depending on the skills required. The remote support is very useful, especially for friends and family that always need some help.

      I’m going to do a short post about VMNEWS – would you object to me using an iframe to embed the content into the post?

      Cheers,
      Jon

  3. It’s worth noting that the “Maximum Performance” power option on Dell’s AMD servers is actually not the highest performance state you can set it to, depending on your scenario. The maximum performance setting will make the machine sit at its first boost P-state regardless of server load, occasionally flirting with its lower base P-state when load gets high and the power cap demands that it throttle down. In this configuration you will never see the processor’s second (higher) boost P-state.

    This configuration can help to keep some low-load tasks fast since the cores don’t spend any time in a sleep state nor in a lower-than-base-frequency P-state, but you will never hit the maximum boost P-state that the processor offers either. As an example, an Opteron 6376 is a 2.3 GHz base frequency chip with two boost P-states: 2.6 and 3.2 GHz. Set to maximum performance, an R715 (or any similar model that can use that CPU) will sit at 2.6 GHz and sometimes flirt with its base P-state of 2.3 GHz when it is highly loaded. That CPU’s maximum boost state is 3.2 GHz, however, so you could be missing out on potential performance there.

    The solution for Dell servers seems to be to change that BIOS setting from “Maximum Performance” to “OS Control”. Doing so will let the CPU idle at its power saving state but also hit its maximum performance when tasks with few threads but high utilization per thread are present. As always, ymmv depending on scenario for whether or not this helps your particular scenario’s real-world load, but if you’ve ever noticed and then wondered why you’re not seeing your maximum boost state this is why. On a side note, I wouldn’t recommend using SuperPI as an analogue for performance in a real-world environment.

    As a note, the way boost works for modern AMD Opteron chips is that there are a number of “regular” P-states that the CPU may always choose from regardless of how loaded up the processor is. These various states are used for traditional power saving. The highest of these is the “base” frequency of the processor. Using the Opteron 6376 as an example again, this is 2.3 GHz. There are also always two boost P-states that go beyond this base frequency. The first one is the “all cores” boost state. The CPU will try to hit this frequency even if all cores are fully utilized so long as thermals and its own estimation of power consumption are within limits. It will throttle cores down to the base state as needed to stay within that intended power envelope. Depending on the actual load, this may never happen or may happen frequently–it all depends on what instructions the program is issuing since some result in a hotter die than others. The second boost state is the “half cores” boost state. The CPU will try to hit this frequency with any core that is 100% utilized so long as overall processor load doesn’t exceed half the number of cores present on the chip. So in the case of that 16-core Opteron 6376 I keep using for an example, this would mean it can happily run 8 cores at 3.2 GHz provided the rest of the cores are essentially idle.

Speak Your Mind

*