Virtualization Performance and Capacity Planning: 2009

Monday, November 16, 2009

Application Awareness Part 1 CPU

In my previous post concerning Application Awareness, I talked about resource utilization and the need for a workload manager type management layer which provides the ability to manage applications from the resource perspective. I would like to break that down to the different resources (core four of CPU, Memory, I/O and Network) for discussion.

From the CPU perspective, the good news is the number of cores is increasing. We started out with a single core, then added hyper threading, to dual cores, four cores and now six cores per processor. This trend aids in the move to virualization and 64 bit OS/Applications as the 32 bit OS's cannot utilize more than 4 cores and most only two cores. What this means is for the virtual world, there are a number of cores available for an ever increasing number of virtual machines per host. That's the good news. The bad news is there are only so many cores to share.

A brief note on how I equate cores for business people. A core is a resource where only one instruction can occur at a time. So sorting a report or fetching a record from disk is an instruction.

There's more good news in that the core speed is increasing as well which means the applications can run faster but more importantly, the core can process the instruction and be free for another instruction/vm to use that resource. So, let's break this down a bit further in how application awareness comes into play here. As applications grow and scale, they are utilizing the increased size (speed) and number (cores) more and more. What this means to the application awareness discussion is the need to separate out which applications need more CPU (CPU intensive processing) and Cores (multi-threaded applications to process in parallel).

In the physical world, this is not a problem. An application is assigned to a physical node and the application gets all of the resources. In the virtual world, this is a four step process.

The creation of the VM requires the number of processors (cores) to be defined.
The VM needs to be placed on a host which has enough processors (cores) to start the VM.
The application needs to perform within limits for Service Level Agreements (SLA's) to be defined and meet.

While VMware's Distributed Resource Scheduler (DRS) provides the ability to set priorities (once the VM is running) at the VM layer to balance resources and ensure VM's have the resources they need, this is a manual process at best. In the cloud where the operations staff is far removed from the business/application owner, taking the step towards application awareness will become vital for the performance and availability of applications in the cloud.

The cloud providers will need to factor into their portals the creation, placement and performance of the VM's to maintain customer satisfaction. Moving towards Application Awareness is the first step.

Saturday, November 14, 2009

Application Awareness in the Cloud? Is it possible?

As we move closer to the cloud (private or public) the vendors are starting to circle the wagons around capacity planning, charge back and availability. In order to do this, the vendors (hardware and software) need to enable their components to be application aware. What exactly does this mean and what have we learned from our past?

A bit of history. Back in the main frame days, there was one computer being shared with batch jobs, online applications (think web today) and databases. There were more but lets start with this list. With only one computer to run all these applications on, there was a need to prioritize the different applications based on available resources and the priorities of the application (i.e. business unit). IBM and others created some thing called Workload Manager. The goal was to provide metrics at the resource (CPU, memory, I/O and network) and application (which application was using the resources and which one had priorities over other applications) level.

Moving back into today's world, in the physical (not virtual world), this is easy. Each application was installed onto one server. The application could use all the resources it wanted as it was the only application on that particular server. With the average CPU utilization rates of around 10%, this model is a huge waste of resources and power. Virtualization has in many ways, taken us back to the mainframe days (without the single computer).

So how do we enable the cloud to be application aware? A good start is to look to our past and what Workload manager did for us in the mainframe days. VMware has done a good job of prioritizing VM's with their resource management functions. This helps with ensuring resources are available on a given ESX host at start time and also for memory and CPU resources during run time. This assumes the VM is a single application, which in most cases it is. With the cloud, many applications from potentially many different customers will be running in the same environment. How do we prioritize these workloads based on service level agreements, price points (think phone companies charging more for day time calls, vice night time calls) and resource availability.

I propose the need for deeper awareness of the applications and resources by utilizing another piece of mainframe history. That being the concept of Systems Management Facility (SMF) and Resource Management Facility (RMF) type records. These records provide the ability to collect resource usage for performance and tuning, application usage for charge back and workload management for prioritization of applications across the cloud.

VMware has started the charge by opening up their API and some third party vendors have started bringing products to market for reporting and alerting on storage, CPU/Memory and network usage. Next steps are to tag the resources allowing application aware reporting and prioritization in the cloud.

Tuesday, March 31, 2009

Virtualization of Share Point

I read an interesting article this evening from the April edition of Windows IT Pro magazine. The article "The essential Guide to Deploying Moss" by Michael Noel outlined the architecture of Microsoft Office Share Point Server (MOSS). What I found interesting was the discussion around scalability options (horizontal or vertical) for the various components (Web, Query, Index, Database, and Application Roles) for both the physical and virtual options.

The article was based on HP hardware (and sponsored by HP)and took the reader through the configurations of a small farm supporting less than 400 users, all the way up to a "Highly Available Farm" supporting 1000+ users.

The article finishes up with an entire section on "Virtualization of MOSS" where the recommendations for virtualization start with the Web role and move onto the Query role. The issue is which roles may fit best in a virtualized environment to make the best use of the hardware and to provide scalability as the site grows. As with most three tiered applications, the queuing should begin at the outside of the application and gradually narrow down to the database layer. Having more web server roles queuing up request for the index or query roles makes sense. These roles require less resources (CPU, memory, I/O and network) than does the database servers on the back end.

My perspective on this article and the need for virtualization performance and capacity planning is; as your site grows, the ability to go back and fix architectural details like the clustering of your database and the number of servers and their utilization, becomes more and more difficult.

I found it interesting that the article stated "SQL Servers that are heavily utilized may not be the best candidates for virtualization, because their heavy I/O load can cause some contention and they may require a large amount of the resources from the host, which reduces the efficacy of the setup.

The contention of resources in a virtualized world should be the number one point of monitoring and testing for new applications. The IOPS, memory and CPU are the most contentious of the core four resources with networking being added to that list in the case of iSCSI and NAS. Proper segmentation of the network load can prevent contention of the network (VLAN's on separate physical NICS as an example or on HBA's). That leaves memory and CPU as points of possible contention. Understanding the OS requirements for Cores and addressable memory as well as the application load can help properly size these resources per virtual machine and per host. Monitoring the key applications (remember the 80/20 rule) provides the awareness of potential problems as the usage grows.

In the case of web applications, a close tie to the business side of the house is also important. If a new web page or process is placed on a web server or any number of increased usage processes happen without proper sizing, the whole site can come down.

Happy reading and remember to plan for and monitor your vital applications.

Source: http://www.hp.com/solutions/activeanswers/sharepoint and http://www.hp.com/go/sharepoint

Saturday, March 28, 2009

Cisco Announcement UCS - Chargeback

This past week (Mar 17), Cisco announced the release of their Unified Computing Service, which equates to their entry into the server market. These boxes from what I have read are some very beefy boxes including half or full width servers (dual or quad socket) and up to 384 Gb of memory per blade. Their networking components supplied with this product include iSCSI and FC networking to connect up to 320 servers.

I see this entry as a great solution for a hosting company that is focusing on offering virtual solutions where the density, networking and security (segmentation) provides the best possible cost points (Capex and Opex).

This brings me to point of this entry. What and how are customers dealing with chargeback for their virtual infrastructures? This is primarily for the ESX world today, however I have a little hunch that the HyperV world will be coming on strong with their latest release. In the grand scheme of charging customers for infrastructure (intra-enterprise, hosting model or the cloud), there are micro and macro ways of doing things. In the physical world, the customer is charged for the whole box, the setup and administrative time and for storage. They pay for the whole box and can use as much or as little of the computing resources as their application needs. In the virtual world, this changes dramatically because the VM's vary in size and more importantly vary in the amount of resources they use. One cannot and should not place all virtual servers of the same size on a given box or data center and assume all VM's of the same size will perform together nicely!

From the VMware perspective, their memory management algorithms allow for the paging of VM's depending on usage and can therefore use the resources defined or less depending on the needs of the application within the virtual machine itself.

So my question is; how important is chargeback becoming in the new virtualized world?

Let's flashback about 25 years to the main frame days, when the main frame was divided up into partitions (sound familiar :-)) and multiple applications (CICS, IMS, DB2, batch workloads, etc) were using the compute resources all at the same time. Each of these applications (i.e. business units) were measured to determine how much of the main frame resources were consumed by each application. This is where capacity planning, chargeback, and performance and tuning all come together to monitor, from the business application perspective, what resources they use and how many resources they need going forward to maintain SLA's and performance expectations.

Now, lets branch over to an analogy of the phone company billing systems. Remember when phone call rates were lower at night and on weekends? The phone companies did this to entice users (via costing models) to move resource requirements off prime time to other times. Now let's take another analogy from the main frame days with something called workload manager. This was a process of assigning a priority to a work load (batch jobs were less important than online transactions and financial reporting was more important than inventory at the gym), and controlling which workload received more resources from the available resources at any given time.

Now, let's bring it all back together. Measuring resource consumption provides business units and infrastructure managers the ability to know what resources are required, and how they are being used. Beginning with measurement, the collected dat allows companies to charge for the resource consumption that is actually being used vice a macro view of you are being charged a flat rate, if the customer is actually using the resources or not.

There are some products out there that are breaking into this market space, including VAlign, VKernel and others.

More on the importance of CMDB

I was reviewing some other blogs this morning and came upon the announcement from Microsoft that Win server 2003 SP0 is no longer supported. Now I may have heard a chuckle there for a minute from some of you as to your thoughts of "who in the world would still be running Win2003 SP0?". Well let me share with you a story of an assessment I worked on prior to a virtualization project. The goal of the assessment was to prepare the organization for virtualization. The first step is to determine how many physical servers they have, measure their performance and through the wonderful world of capacity planning, determine how many ESX hosts they would need (loaded to a pre-determined level).

Well the assessment went off without a hitch and the results were presented to the project manager. He went through the roof at the results! The problem was not the report, but what the report told him. You see, they had just finished up a rather nasty project of upgrading all their servers from Win2000 to Win2003. The project was wrapped up and they had reported the completion of the project to the CIO. The assessment report showed the existence of 12 more Win2000 servers that had not been converted.

The point being, without a complete inventory of computing resources (CPU, memory, Network and storage) both physical and virtual, the fires just keeping being lit like the trick birthday candles. Put one fire out and it simply starts back up. You quickly run out of breath or remove the candle right? To turn your organization around from a tactical mode to a strategic mode, consider completing a through inventory of your organization.

Now, let's move on to how to conduct the inventory. There are usually three different means of conducting an computing resource inventory.
1. Have your agents (Tivoli, HP Openview, BMC Patrol and many others on the market) tell you what you have. The problem is you most likely do not have agents on every machine in your environment (think test/dev/qa) so this can not possibly be complete.
2. Send out the email to all business managers asking them to tell you what they have. Um, think about that. What would you report and how would you collect the information?
3. Utilize an agent less tool that can scan LanMan directories, perform an IP scan of your sub nets AND interview business units for possible hidden assets behind firewalls or on stand alone networks (you know they exist out there).

The inventory should then be validated with procurement and any provisioning/software distribution processes to ensure a complete listing. Only then can an organization start down the strategic road of knowing what they have and where they are going.

Keep in mind, an inventory can and most likely should include software, security patches and services installed or not.

Thursday, February 12, 2009

Configuration Management Database

This post is focused on the current configurations of customer environments and the difference between what they actively manage (have agents on) and the total amount of computing resources in the Enterprise.

I worked for VMware for 2.5 years and have many stories about how many servers customers thought they had and how projects were aligned on a set of servers to find out they had more servers than they thought they had.

My point being, having a complete inventory of computing resources is a vital step to aligning the business needs with the IT support for any organization. As I reflect on the computing world as it progressed from the main frame days through the explosive growth of the distributed computing world to the proliferation of the Windows server world, it has amazed me the disconnect (in far too many cases, but not all cases) between the procurement process, the needs of the business (computing requirements) and the IT departments. I say departments because of the difference between the Windows and UNIX side of most companies.

I digressed, however my point being how many companies have a hardware and software (more on why both later) inventory of their IT resources. Note I did not say, IT department, but IT resources across the enterprise. With the proliferation of servers, there are department level servers running vital business process still sitting out there under desks and in closets that have been sealed up with drywall. You have all heard the stories.

So where do we start? There are three ways to collect inventory information.
1. Send an email to every one having them reply with the servers they know about or use
2. Use an agent based technology to report what they see (remember you have to pay for every agent so likely as not, there is not an agent on every box (think development, test and in the back corner of the warehouse)).
3. An agent less collection process where Lanman, IP discovery or manual importing can safely scan the environment for "every" computing resource in the enterprise.

Back to the need for a hardware and software inventory. The difference lies in a tactical (put out the fires) IT organization or a strategic (what do we have and how do we manage it better) oriented organization. I participated on a call with a customer who had delayed virtualizing their applications until their upgrade to Win2003 was completed. We ran an assessment on their servers and found 20 servers still running Win2000. We thought we were progressing with the virtualization project only to find out they still had 20 servers to upgrade and no budget for that extra workload. I use that analogy as the same holds true for organizations that are patching some of their servers but not knowing how many are truly out there, the patch project keeps going and going like the energizer bunny. Another analogy is the number of versions of software start to slow down the IT service organization for patch management, support and development and support of new projects on old servers.

Bottom line; having a current (and updated quarterly) hardware and software inventory for both physical and virtual computing resources is vital for any size organization to help focus on strategic IT initiatives aligned with proper cost and support.

Next discussion will focus on the growth of the processor speeds on the servers and how to match the technology to the business applications for right sizing.

Tuesday, February 10, 2009

Virtualization and Chargeback

The concept of chargeback in the computer industry has been around for years. The mainframe provided the worklevel smf records, which allowed reporting tools like CA's Job Accounting and Reporting (JARS) to divide up the records of which services were being used by which customers. The reporting allowed for summarization by resources used.

Fast forward to the current world where virtualization has become a commodity and everyone from managed services organizations to the "Cloud" are running some type of hypervisor. So I pose the question to the industry, what is the costing model for sharing a resource and how much is a customer expected to pay?

Also keep in mind that the number one rule for a charge back program is that the processs must be repeatable, meaning if I run the same data base query every night, the cost should be the same.

So with these initial concepts in place, it will be interesting to see the various chargeback models as they come to market and even more interesting will be the hypervisor vendors and what performance and usage records they will produce that will allow chargeback systems to be created and brought to market.

Next posting will be a list of charge back vendors and how they are getting the data.

Virtualization Performance and Capacity Planning