Category Archives: vSphere

The VVols Rabbit Hole

Recently, I was surprised to learn that you cannot create a Veeam replication job to VMware Virtual Volumes (VVols). This highlights that we always need to be thorough when performing one of the most difficult tasks in the design process. That task is confirming compatibility of all components in a virtualized infrastructure.

In these times of “get it done quickly” IT, people tend to have a ready – shoot – aim methodology. Following a proper methodology, including assessment and design, will keep you out of trouble in these situations.

Because of the surprise with Veeam, I did a little digging to see what is and is not compatible with VVols. This was not an easy task. Read more »

VSAN 6.0 – What’s New

Among other things, VMware released Virtual SAN 6.0 earlier this month, in conjunction with vSphere 6.0. Realistically, this is a “2.0” release, but I am guessing they are calling it “6.0” to conform with the vSphere releases. I think that is a bad idea because it can put undue pressure on the developers to keep with the cadence of vSphere, vRealize and everything else that is taking on the “6.0” versioning. But the marketing people feel it is easier to manage compatibility questions.
Let’s face it, whether it is is called 2.0 or 6.0, it is still a “.0” release with plenty of new things to go awry.

Read more »

VMware PEX 2015: New Stuff With vSphere 6

With version 6.0 of VMware’s flagship product comes plenty of enhancements. According to VMware’s press release, there are more than 650 improvements, but I have not seen a master list yet. The maximums of vSphere are leapfrogging the maximums of Hyper-V. Unless you are planning on running SAP HANA in a virtualized environment, you could probably not give a crap about some of the scalability enhancements. They may be nice to have but how often will you use them. Here are some of the improvements to vSphere 6.0:

Read more »

VMware PEX 2015: The Big Announcements

As expected, several announcements were made at VMware Partner Exchange 2015. The most anticipated announcements involved vSphere 6, VSAN 6 and EVO:RAIL. As an old dog I’ve become fairly jaded in reaction to many of these announcements. However there are some significant features in vSphere 6 and VSAN 6. There are’s also some interesting things surrounding EVO:RAIL.
The message from VMware to its 4000 attending Partners was  “One Cloud, Any application, Any Device.” Oh, and no more PEX in the future. The plan is to have all technical sessions available at VMworld, which is a horrible idea. EMC does this at EMC World and all of the technical people are shuffling back and forth between SE-focused sessions and customer-focused sessions.

Read more »

VMware PEX 2014 – Notes Part 2 – The Good

The most significant part of VMware PEX, for me this year, was the Solutions Exchange floor and the rather small number of vendors. My focus was on convergence of compute and storage resources. This appears to be a popular path. There were a few things along the way, other than cheap swag, that caught my eye. One interesting conversation involved FusionIO. They validated that many customers concentrate more on storage space instead of performance and that this is not good. Some more progressive enterprises are very focused on performance. For instance, eBay actually measures costs based on url per kilowatt. Read more »

VMware PEX 2014 – Notes Part 1 – The Bad

I have been attending various VMware Partner Exchange (PEX) and VMworld events since around 1996. Typically, I prefer to attend PEX over VMworld. The number of attendees is significantly smaller and access to the VMware brain trust is easier. There is usually a good mix of NDA roadmaps and decent technical information. The Solutions Exchange floor is less crowded and the vendors are able to spend more time with you. The hands on labs are the same top-quality as VMworld, but typically with no lines.

I did not attend VMworld 2013, but I did attend PEX 2014 last week. Sadly, I was a little bit underwhelmed again this year, as I was last year too. There was no real feeling of innovation. No buzz. Just ho-hum. There seemed to be fewer exhibitors on the Solutions Exchange floor as compared to previous years. I did have some very educational conversations with some vendors that I will detail later.

Despite all of the drama around Veeam and Nutanix being missing in action, there was no mention of VMTurbo. The people at the Cisco booth had nothing about Whiptail or Insieme.

The push is bundled suites of software, which will offer a single management point and a common interface. But many parts of these suites will likely end up as shelf-ware to many. Let’s face it, many enterprises will need to change significantly in order to fully utilize the vCloud suite. If it is not a top down directive, either the networking silo or storage silo is going to protest over lack of control. Try explaining to network engineers that you need a pool of VLANs and associated IP addresses that will be out of their control and probably will be difficult to integrate with the many manual processes that are used.  Try telling a storage administrator that we can now do self-service provisioning of a storage array, including all of the parts in the middle, like zoning and masking. Like VDI, they see that ROI is heavy on “soft costs.” Try explaining how you save on labor costs without reducing workforce in this economy where IT shops are already understaffed. Many people don’t get it when it comes to Cloud, which suddenly became Software Defined Data Center (SDDC) and now Software Defined Enterprise (SDE). But that can be the subject of a future post.

There was a little video before the keynote on Tuesday morning that hammered it home. Think about VMware in its first ten years. No one “got” virtualization. But VMware persisted and stuck to its guns. Now it has become standard issue in a data center. VMware is taking the same approach to the Software Defined Data Center. They are even starting to call it Software Defined Enterprise instead. Their stance is that, with persistence, the SDDC message will be heard and will become the norm. There is a constant struggle between the people that deliver IT and the people that consume IT. The fundamental ideas of SDDC help calm that struggle.

Right now, Microsoft appears to be hot on the tail of VMware and I see many people seriously reconsidering their method of delivery. I think the biggest things that VMware has going for it right now are vCenter Operations Manager (vCOPS) and Site Recovery Manager (SRM). I think that SRM is possibly the only thing keeping some enterprises on VMware for business critical applications. VMware is trying to display an air of non concern. Possibly, they are ignoring the “Evil Empire.” When many enterprises are already paying for Windows Datacenter Edition and System Center, there needs to be justifications to keep vSphere. I see Hyper-V taking a foothold, especially in SMB and branch offices and in places where basic server virtualization is good enough.

It is interesting to see some of the visions play out over the course of time. I remember back in 2007 or 2008, when PEX was still Technical Solutions Exchange (TSX). I remember Carl Eishenbach (http://www.vmware.com/company/leadership/carl-eschenbach.html) announcing the VCDX program. I remember he said that there would be about 200 of us by the end of 2008. Back then, customers needed someone to design their greenfield environment and assist with migration from physical to virtual. I don’t find myself needing to prove that vMotion works any more. Although there are likely many greenfield opportunities out there, most of my design expertise is now spent assisting with creating higher consolidation ratios and helping customers deliver a more optimized datacenter that may not always have vSphere at the top of the list. I have seen VMware go from a stand alone hypervisor to centrally managed solution to the defacto standard then to what I describe as “meh.” There is no pop anymore. No excitement. Maybe I am getting too pessimistic in my old age.

Stay tooned! I have more to come on the interesting finds on the Solutions Exchange floor.

Big vCloud Director Security Gotchas That I Have Found

This post includes an important security “gotcha” that I recently uncovered with vCloud Director 1.5 running on vSphere 5. If you are using vCloud Director, you should check your settings.

The BIG Security Issue

Read more »

PAVMUG Session – Virtualizing Business Critical Apps

For the September 2011 PAVMUG all day meeting, I participated in four sessions. To me, the session with the most audience participation was about virtualizing business critical applications. My session really dug deep into Microsoft Exchange but also covered some basics around SQL and Oracle. I wanted to expand some of the ideas that were discussed during the session and post the presentation slides.

Read more »

Maybe VMware Needs a Quality Oversight Department…

I was doing some research for session I am presenting at an upcoming PAVMUG session about vSphere remote management when I came across an apology by one of the PowerGUI guys. Essentially, he was apologizing for something that VMware changed in the functionality of PowerCLI that affects how the PowerGUI Virtualization Powerpack interacts with it.

Read more »

ESX is Going Away – How to Migrate to ESXi

If you didn’t know it yet, VMware announced a while back that future releases of VMware will not include the “traditional” ESX Server. From their site:“VMware vSphere 4.1 and its subsequent update and patch releases are the last releases to include both ESX and ESXi hypervisor architectures. Future major releases of VMware vSphere will include only the ESXi architecture.”

If you are in a “24/7/365” shop then the applications running in your private cloud should currently be in virtual data centers (vDC) that are contained in DRS/HA clusters and the migration can be completed with no downtime to the applications. However, there are still other systems, such as development and test systems or possibly some minor infrastructure services applications that may not benefit from vSphere’s availability features. I know many people have scheduled outages, shutdowns, etc. during the upcoming holidays. It may the best time to migrate to ESXi…

A Little Bit About ESXi 4.1

I have actually been pushing ESXi since version 3.5. In every plan and design engagement where I have been involved, I have always started with ESXi as the version to be used unless there was a compelling reason NOT to use ESXi. I think the only thing right now that requires ESX is HP’s Matrix. I have yet to find anyone that uses that behemoth. There have been many improvements to the features of ESXi since version 4.0. VMware has a nice “Why ESXi” web page to explain these new features. The biggest thing is that ESXi has a smaller footprint and has fewer security updates needed than the “traditional” ESX Server. For instance, the latest round of patches has 11 for ESX and only two for ESXi. The small footprint allows ESXi to be installed on a USB stick, an SDHC or just on less disk space. There is also a nice VMware KB article explaining all of the differences between the current version of ESX and ESXi.

Here are some other things that I like about ESXi:

Treat the ESXi Server like it is an appliance

The ESXi installation should be treated as firmware. It is even called firmware in Update Manager. This means that there is no actual upgrade. The firmware is installed fresh every time. This also brings the idea of having stateless servers at some point in time. More on that later.

Kickstart Scripting

Really, there are three methods for a mass deployment. One is to use Host Profiles, but this requires manual steps and the extra costs associated with Enterprise Plus licenses. A second method is to install a base install of ESXi using the method I outlined in a previous post and then use PowerCLI or the vSphereCLI to customize the server. The new preferred method is good ‘ol Kickstart.

Several Boot Options

Now, you can boot from local disk, SAN Disk, a USB Key or an SDHC Card. One of the arguments some have against booting from USB or SDHC is the dreaded single point of failure (SPOF). My answer has always been that HA will cover this. And, if you think about it, an internal array controller is a SPOF as well. But, now you can boot from SAN if you wish. Just remember to create a space for logs and core dumps if you go the route of USB or SDHC.

Active Directory Integration

ESXi servers can now become members of a Micro$oft Active Directory Domain so that administrators can authenticate to the directory. Setting this up is done through the vSphere Client under the configuration tab. It appears that you can call the administrators group anything you want in AD as long as it is “ESX Admins”. Maybe that will change in future versions.

Hardware Monitoring

You can just do away with SNMP monitoring and use the Common Information Model (CIM) providers. The stock version comes with some basic CIM monitoring, but the major hardware companies provide custom baked versions of ESXi with OEM-specific CIM providers for more granular monitoring. Rather than setting up your monitoring software for SNMP, you just point it to the ESXi server and set it up for CIM or WBEM. If you are not using a monitoring server, you can use the vCenter server to handle alerts and alarms.

Enhanced Tech Support Mode and vCLI Operations

Using Tech Support Mode, formerly known as “Unsupported Mode,”  is now supported. You can log in as an administrative user and all commands run in TSM are logged. The biggest reason many people went into TSM was to kill a stuck VM. There is now a vCLI operation that will allow this.

The Migration Path to ESXi

Here is the “Super Dave Method” for migrating to ESXi. Like I mentioned before, if you have at least ESX Clusters with vMotion enabled, this will be a relatively painless process, with no downtime. Obviously, if you are also upgrading from a previous version, there will be reboots required for VMware Tools and possibly to upgrade the VM Hardware to version 7.

TEST TEST TEST

Yeah, you heard me. Take some time to familiarize yourself with the differences between ESX and ESXI by setting up at least one test system so you can hammer it before changing your production systems. The nice thing is that you don’t need hardware to do this. You can set up a VM and run ESXi as a VM.

vMA, vCLI and PowerCLI – Oh My

If you are already familiar with doing things from the ESX console, the vCLI will be very familiar to you. If you are a Winders guy, you should already know PowersHell, so the PowerCLI will be familiar. Also, get to know the vSphere Management Appliance (vMA). The vMA is a RHEL5 based VM Appliance that is pre-configured with the Perl SDK and vCLI. It also includes vi-fastpass command, which allow you to pass authentication through to the hosts and to vCenter. It also includes vilogger, which allows you to create a (very) poor man’s syslog server. I think you are better served using Splunk! or phplogcon. Either will allow you to parse the syslog entries eisier if you need to do any troubleshooting or forensics. If you want a nice guide to the vMA, head over to the vGhetto for some nice tips and tricks. If you are already a Linux shop and RHEL is not your cup of tea, then you can bake your own vMA with the Perl SDK and vCLI, but you will lose the vi-fastpass and vilogger capabilites. vi-fastpass is not a huge loss and you can set up your home baked vMA to include syslogd and phplogcon.

Create a Kickstart Script

The most efficient way of performing an installation on more than one server is to script it. VMware now supports using Kickstart. Test your KickStart script on the test ESXi server, weather it is physical or virtual, you should be able to test most of the functionality. Check out the nifty “DryRun” setting too. I am not going to rehash what has already been done or said regarding Kickstart, so here are some decent links:

Take a look at this VMware Labs Fling if you want to create a nice deployment server that automates the ENTIRE process. This brings up the idea of having stateless ESXi servers. As each one is inserted into the vDC, ESXi is automatically set up for you. All the “important stuff” is stored on shared disk.

If you decide to go the route of using PowerCLI, take a look at this nice post.

Back up EVERYTHING!

Back up EVERYTHING! ‘Nuff said? There is no better feeling when the shit hits the fan than to have a good backup. Make sure you back up the vCenter database and capture the configuration settings of all of the ESX servers. If you have Enterprise Plus, capture a host profile for each server. Take screen shots of the configurations settings in the vSphere Client. Include networking and all of the tabs in the vSwitches. Include storage and all of the tabs of any iSCSI initiators. Don’t forget the IQNs! Check the advanced settings for any tweaks. Document the whole thing.

Pick the First Victim and Move Your VMs

If you are using DRS and HA, disable HA. Then place the first victim host in Maintenance Mode. This should automagically move the VMs to other hosts in the cluster if DRS is working. Or you can manually (*GASP*) vMotion your VMs. If they are on local storage, set up an iSCSI server – OpenFiler is free. You can then use Storage vMotion to move the VMs to the iSCSI storage.

Remove from inventory

Yep! If the server is a part of a cluster, remove it from the cluster. Before removing it from inventory, unassign the license. Then, remove it from inventory. Once it has been disassociated from the vCenter Server, shut it down.

Now is a good time to update the BIOS, device firmware, etc. Blow out the dust. Perform some hardware TLC. If you do not trust yourself, disconnect it from the SAN. If you are using software iSCSI initiators or NFS, no worries because you will need to reconfigure this stuff after the install.

Install ESXi

Now is the time to actually perform your first ESXi installation. Go ahead, we’ll wait.

Post Configuration

Once ESXi is installed, you can add it back to the vCenter Server, add the license and perform any post configuration steps like apply the host profile or run your PowersHell script for post configuration. Once the post configuration is completed, confirm that all of the settings are correct and match what you documented previously. If all is good, add it back to the original cluster. If you want to set up Enhanced vMotion Compatability (EVC) and it’s not currently set up in the original cluster, you can create a new cluster for the ESXi Servers with EVC enabled.

Repeat

Repeat the process for each server. If you decided to create a new cluster, start moving VMs to the new cluster.

More Tips

If you are installing ESXi to a USB Stick or an SDHC card:

  • Make sure the stick or card is supported by the hardware manufacturer.
  • Set up a syslog server. This is very important.
  • If you don’t set up a syslog server, make sure you have a swap partition on the centralized storage. If you do not do this, the logs are deleted on reboot. Then set up a syslog server.

Make sure that during the test phase you understand how to set up authentication, syslog settings and nay required custom settings. Also make sure you know your way around the DCUI and the Tech Support Mode console.

A Few Gotchas With vSphere 4.1! Updated

Since everyone else in the world is heralding the release of vSphere 4.1, I figured I would post some bad news. The stuff you may want to know BEFORE you jump into upgrading to vSphere 4.1. Before I start, I want to make it clear that vSphere 4.1 is a great product overall. And I have already been leaning to ESXi, so the announcement that this will be the last release with the “traditional” ESX has been expected. I will talk about ESXi and its improvements in a later post. I just want you to be aware of these rather significant Gotchas.

Gotcha #1 – Read Only Role allows members to add VMKernel NICs

From the release notes (You actually READ these, right?):

  • Newly added users with read-only role can add VMkernel NICs to ESX/ESXi hosts
    Newly added users with a read-only role cannot make changes to the ESX/ESXi host setup with the exception of adding VMkernel NICs, which is currently possible.

    Workaround: None. Do not rely on this behavior because read-only users will not be able to add VMkernel NICs in the future.

This is a fairly big security issue. I just LOVE the workaround notes. To be fair, I have found only one installation in my experience that uses the Read-Only Role. In my opinion, if they don’t have access to the physical data center, they don’t need any access to vCenter. But this is just something that should have been corrected before release.

Gotcha #2 – ESX/ESXi installations on HP systems require the HP NMI driver

  • ESX installations on HP systems require the HP NMI driver
    ESX 4.1 instances on HP systems require the HP NMI driver to ensure proper handling of non-maskable interrupts (NMIs). The NMI driver ensures that NMIs are properly detected and logged. Without this driver, NMIs, which signal hardware faults, are ignored on HP systems with ESX.

    CAUTION: Failure to install this driver might result in silent data corruption.

    Workaround: Download and install the NMI driver. The driver is available as an offline bundle from the HP Web site. Also, see KB 1021609.

It seems that every time HP releases a new set of SIM agents for ESX, something breaks. Is this VMware’s way of putting it on HP? Or was this an “OOPS”? If you search for “HP VMware NMI Driver” you come up with nothing. No download. It was no where to be found on Monday, but I did find it today on the HP support site.

Gotcha #3 – VMware View Composer 2.0.x is not supported in a vSphere vCenter Server 4.1 managed environment

The basic issue here is that vCenter 4.1 only works on a 64-bit system. View Composer only works on a 32-bit system. From the KB Article:

“VMware View Composer 2.0.x is not supported in a vSphere vCenter Server 4.1 managed environment as vSphere vCenter Server 4.1 requires a 64 bit operating system and VMware View Composer does not support 64 bit operating systems.
“VMware View 4.0.x customers who use View Composer should not upgrade to vSphere vCenter Server 4.1 at this time. Our upcoming VMware View 4.5 will be supported on VMware vSphere 4.1.”

Don’t these guys talk to each other? Didn’t they learn their lesson with the PCoIP issues? And why can’t you just admit it in the release notes instead of putting a link to the KB article? I completely missed this Monday morning.

Gotcha #4 – vCenter Installer SILENTLY Changes SQL Server Settings to Allow Named Pipes

  • vCenter Server installation or upgrade silently changes Microsoft SQL Server settings to enable named pipes
    When you install vCenter Server 4.1 or upgrade vCenter Server 4.0.x to vCenter Server 4.1 on a host that uses Microsoft SQL Server with a setting of “Using TCP/IP only,” the installer changes that setting to “Using TCP/IP and named pipes” and does not present a notification of the change.Workaround: The change in setting to “Using TCP/IP and named pipes” does not interfere with the correct operation of vCenter Server. However, you can use the following steps to restore the setting to the default of “Using TCP/IP only.”
  1. Select Start > Programs > Microsoft SQL Server 2005 > Configuration Tools > SQL Server Surface Area Configuration.
  2. Select Surface Area Configuration for Services and Connections.
  3. Under the SQL Server instance you are using for vCenter Server, select Remote Connections.
  4. Change the option under Local and Remote Connections and click Apply.

Can you hear the DBAs pissing and moaning?

Gotcha #4a – SQL Database is changed to Bulk Recovery Model (updated 10/27)

This on is funny. I just found out about it on 10/27/2010. When is comes to SQL for the vCenter database, VMware recommends using a simple recovery model. So, with their attention to detail, the upgrade process changes the database to a bulk recovery model. Inn this model, the logs keep growing until a backup purges it. No good.

Transaction log for vCenter Server database grows large after upgrading to vCenter Server 4.1 – http://kb.vmware.com/kb/1026430

Conclusion

Again vSphere 4.1 brings some great improvements and some welcome changes. As the product matures and more vendors work with the APIs, we will see some nice features that will help you in your journey to the private cloud. The Gotchas listed above may not exist if quality assurance is tightened. I think I would rather hear that a release is delayed because of pending bug fixes. How long will we need to wait to fix these? In any case, if the Read-Only Role or the View Composer gotchas don’t apply, then jump right in and install or upgrade to vSphere 4.1. Just make sure you install the NMI drivers and fix the SQL settings.

Update 2010-07-16

I got a tweet from William Lam last night. It looks like versions are hard-coded in Capacity-IQ making it incompatible with vSphere 4.1. Will also explains two ways to make it work.

My VCAP-DCA Exam Experience

In case you have been living under a rock and haven’t heard, VMware is getting ready to release a new set of advanced certification exams that will take you along the path to become a VMware Certified Design Expert on vSphere 4 (VCDX4). Just like VCDX3, it starts with the requirement of being a VMware Certified Professional on vSphere 4 (VCP). You will then need to pass two exams before being able to submit and defend your design. VMware has decided to award new certification statuses for passing these exams. The exam to become a  VMware Certified Advanced Professional on vSphere 4 – Datacenter Administration (VCAP-DCA) is currently finishing up its beta run. The exam to become a VMware Certified Advanced Professional on vSphere 4 – Datacenter Design (VCAP-DCD) is not yet in beta. The path to achieve VCDX4 status is laid out on VMware’s site and is illustrated below:

Just like Jason Boche, William Lam and Duncan Epping, I had the privilege of taking the beta version exam. As you can see from the upgrade path, I am not required to take the exam to obtain the VCDX4, but I am a glutton for punishment I guess. Also, not having it as a requirement took some of the pre-test jitters off of me. At first, scheduling conflicts prevented me from being able to sit for the exam within VMware’s original deadline. However, I got a call on June 17th that I could take it on July 2nd. Wow…a two week notice, and on my only scheduled day off since April. But I eagerly accepted the invite. Because of the limited notice and the fact that I was juggling a few projects at the same time I debated even studying for the exam. An unscientific survey on twitter showed that 4 out of 4 followers recommended that I study for the exam. I don’t want to come across as arrogant or as a “know-it-all.” My argument here is that I am already a VCDX, I should know this stuff.  My schedule and my severe procrastination tendencies made me decide to do a little bit of review the night before.

Before I begin with my thoughts on the exam content, I want to express that I only had two “issues” with the exam experience itself. First a little bit of background: The exam consists of 41 “questions”, which are actually multifaceted problems that you need to solve with the tools that are presented to you. You have 4.5 hours to complete the exam. The problems are presented in a familiar Vue test engine. You click a button to switch to a desktop session with a few of the typical tools used  to administer a vSphere environment. The issue was with the screen refresh for the GUI based tools. When I clicked on an item, sometimes all of the tabs are not presented properly or the content is not complete. This was pretty annoying and sometimes a hindrance. When I participated in the beta exam for the VI3 Advanced Administration Exam, I did not experience this. Hopefully, this will be cleared up before the exam becomes GA. I would think that a leader in desktop virtualization would have a method to avoid this type of thing. The second issue is a provision for breaks. You can take “unscheduled breaks” but I think the clock keeps ticking. It would be nice to actually have a scheduled break without a time penalty. As you get older, you NEED the breaks…

Now, on to the content. Forget about me actually telling you the actual content of the exam. The NDA prevents this and I want to participate in future beta exams. I got my VCDX3 via beta exams and I hope to get my VCDX4 this way!

I’ll admit it. Working primarily in the SMB market limits your skills a bit. I am not as exposed to some of the more advanced features of vSphere 4 as I used to be when I worked in an “enterprise” market.  I skipped a couple of problems because of this. I intended to return to them, but the clock ran out before I could. The problems were a very good compendium of the advanced skills required of a more senior VMware Administrator. It was the toughest exam that I have ever taken. The second toughest was the VI3 Advanced Administration Exam. I thought the questions were very fair and there was nothing in the content that caused me any objections.

I was pretty relaxed when I started the exam, but started to PANIC during the last 30 minutes.

The one (personal) issue I have with this type of exam is that it measures you at a point in time on how much you have memorized. Since I don’t want to use an example of a problem that may be on a VMware exam, I will use one of my cars as an example here. Say, for instance that I am sitting in on the 1972 Ford Gran Torino Advanced Administration Exam…

Let’s say a question on the exam is to set the Ignition Points gap. This is something I did a few times on several cars. I know where to find the ignition points. I know how to set the gap. I have the proper tools to do it.  But I don’t know what that setting should be. In the REAL world, I would look it up in a manual or on Google. And I looked up the setting every time I did it. Would I fail the test because I know HOW to do it, but don’t know the proper setting? Probably. My teenie brain can’t hold all of this information – especially with all of the Monty Python references in there, not to mention the words for almost every song by Rush and Iron Maiden…

My Advice

Back on track… Echoing Duncan, Jason and William,  I have a few tips to offer for this exam:

  1. Read the Exam Blueprint. Perform each task listed in the blueprint a few times, so you know HOW to do it. You DO have access to “–help” and man pages during the exam if you are stumped. However, refer to item #3.
  2. Build a LAB! You will need it for item #1. You don’t have to go out and buy servers and storage. All you need is a reasonably fast 64bit PC or laptop with a decent amount of RAM. Some things may be slow, but you will get through it. You can make an ESX server in a VM. Use VMware Player or VMware Workstation to host your lab VMs. Every VMware product in the blueprint is either free or has an evaluation period. Didn’t you get a free VMware Workstation license with your VCP?
  3. Manage your time! I ran out of it. You have the opportunity to go back. Skip questions if you don’t know how to do it or think it will take a while. The other thing I noticed was that, since the exam is using a live lab environment, the tasks happen in real time. During my panic state, I started to multitask and work on more than one problem at a time. Instead of clicking “Next” and waiting for the task to complete, click “Next” and start on the next problem. Juggle two or three problems. Use your dry erase board to keep track of skipped problems and multitasking. I am not very fast with my typing and I am constantly mixing up letters in words. I call it “typing dyslexia” and it doesn’t help me in these situations!

I don’t know if I passed this one. I am a little bit pessimistic at this time. I will find out in “4-6 weeks”, but that is VMware Time… Good luck to all that have or are planning to take this exam.

vShield Zones – Some Serious Gotchas

OK..I’ll admit it: I am spoiled by the capabilities of vSphere. What other platform lets you schedule system updates that will occur unattended and without outages of the applications being used? I don’t mean the winders patches, they require a monthly reboot. I am talking about the hypervisor updates. VMware Update Manager coordinates all of this for you. Then along comes vShield Zones to break it all.

First, let me explain what I am trying to do. To simplify things, vShield Zones is a firewall for vSphere Virtual Machines. Rather than regurgitate how it works, take a look at Rodney’s excellent post. A customer has decided to use vShield Zones to help with PCI Compliance. The desire is that only certain VMs will be allowed to communicate with certain other VMs using specific network ports, and to audit that traffic. ’nuff said.

vShield Zones seems to be the perfect solution for this. It works almost seamlessly with vCenter and the underlying ESXi hosts. It provides hardened Linux Virtual Appliances (vShield Agents) to provide the firewalling. It provides a fairly nice management interface to create the firewall rules and distribute them to the vShield Agents. Best of all, IT’S FREE! At least for vSphere Advanced versions and above. Keep in mind, that this is still considered a 1.x release and some things need to be worked out.

Now, on to the gotchas.

Gotcha #1 – Networking

When it comes to networking, the vShield Agent is designed to sit between a vSwitch that is externally connected via physical NICs (pNICs) and a vSwitch that is isolated from the outside world. The vShield Agent installation wizard will prompt you to select a vSwitch to protect. This is illustrated below. The red line indicates network traffic flow.

Click the Image to Enlarge

Click the Image to Enlarge

This works like a champ in this configuration, using a vSwitch for management, which is naturally on an isolated network to begin with, using a vSwitch for VMs to connect to the vShield Agent and using a vSwitch to connect everything to the outside world.  This can also be deployed with limited down time. If you are lucky enough to have the Enterprise Plus version, you may want to use a vNetwork Distributed Switch or even a Cisco 1000v. You will need to make some manual configurations to make this work as outlined in the admin guide.

The gotcha is with blade servers or “pizza box” servers that have limited I/O slots. If all of the VM traffic must flow through the same physical NICs and you use a vSwitch, then you need the vShield Agent to protect a port group rather than an entire vSwitch. You will need to create a vSwitch with a protected port group and connect it to the pNICs. Then you you can install the vShield Agent. Once the vShield Agent is installed, you will need to go back to the vSwitch attached to the pNICs and add an unprotected port group. This is illustrated below. The red line is the protected traffic and the blue line is the unprotected traffic.

Click on Image to Enlarge

Click on Image to Enlarge

As you can see, there is an unprotected Port Group (ORIGINAL Network). This needs to be added to the vSwitch AFTER the vShield Agent is installed. If the ORIGINAL Network is already a part of the vSwitch, it will need to be removed BEFORE installing the vShield Agent. In order to avoid an outage, you will need to disable DRS and manually vMotion all VMs off of the ESX/ESXi host before installing the vShield Agent and modifying the port groups.

Gotcha #2 – DRS/HA Settings

The vShield Agents attach to isolated vSwitches with no pNIC connection. As you should already know, using DRS and vMotion on an isolated vSwitch could cause inter-connectivity between VMs to fail. By default, you cannot vMotion a VM that is attached to an isolated vSwitch. You will need to enable this by editing the vpxd.cfg file. You will also need to disable HA and DRS for the vShield Agents so they stay on the hosts where they are  installed. Both are well documented. Obviously, you will need to install a vShield Agent on every ESX/ESXi host in the cluster.

The Gotcha here is that, with HA disabled for the vShield Agent, there is no facility for automatic startup. There is an automatic startup setting in the startup/shutdown section of the configuration settings. First, this is an all-or-nothing setting. Second, according to the Availability Guide:

“NOTE The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a VMware HA cluster. VMware recommends that you do not manually re-enable this setting for any of the virtual machines. Doing so could interfere with the actions of cluster features such as VMware HA or Fault Tolerance.”

So, if a host fails, HA will restart all protected VMs on different hosts. If the host comes back on line, you risk having DRS migrate protected VMs back to that host. This will cause those VMs to become disconnected because the vShield Agent will not automatically start. If a host fails, hope that it fails good enough so it won’t restart.

Gotcha #3 – Maintenance Mode

At the beginning of this post, I mentioned how VMware Update Manager has spoiled me. VUM can be scheduled to patch VMs and hosts. When host patching is scheduled, VUM will place one host in Maintenance Mode, which will evacuate all VMs. Then, it will apply whatever patches are scheduled to be applied, reboot and then exit Maintenance Mode. It will repeat this for each host in a cluster. This works great unless there are running VMs that have DRS disabled, like the vShield Agent.

In the test environment, when a host was manually set to enter Maintenance Mode, it would stall at 2% without moving the test VMs. I am not sure the order that VMs are migrated off, but none were migrated in the test environment. This could vary in different installations. Here’s the gotcha: you cannot power the vShield Agent off because the protected VMs would become disconnected. You cannot migrate it to a different host because it would cause a serious conflict and cause protected VMs to become disconnected. The only thing you can do is place the host in Maintenance Mode, then MANUALLY (*GASP*) migrate all of the protected VMs and then power the vShield Agent off. So much for automated patch management. We’re back to the “oughts.”

Conclusion

I said already that vShield Zones is a 1.x product. It’s a great firewall, but it has a few gotchas that you need to consider. The benefits may outweigh the negatives. But vSphere is a 4.0 product.Some of this should be able to be addressed by tweaking vCenter or host settings.

vShield Zones should be smart enough to allow us to select specific port groups to protect rather than an entire vSwitch. I guess whatever scripting is being done in the background will need to be changed for this. Maybe we need a Ghetto vShield?

One of the REALLY smart people at VMware should be able to tell us the “order of migration” when a host is placed in Maintenance Mode. Once that is determined, there is probably a configuration file somewhere that we could tweak to change it.

There should be a way to set up automatic startup and shutdown of individual VMs. The Startup/Shutdown settings sort of deprecated once DRS was introduced. The only time it is useful is with a stand-alone server or in a NON-DRS cluster. I guess the only thing that could be done is to add a script somewhere in rc.d or rc.local to start up these VMs, but how can that be done in a “supported” fashion with ESXi and is it supported in either ESX or ESXi?

I brought these issues up with some VMware engineers and they assure me that they are working on this. Hopefully they will figure it out soon. I hate doing things manually. It seems like it is anti-cloud.

Is Your Blade Ready for Virtualization? A Math Lesson.

I attended the second day of the HP Converged Infrastructure Roadshow in NYC last week. Most of the day was spent watching PowerPoints and demos for the HP Matrix stuff and Virtual Connect. Then came lunch. I finished my appetizer and realized that the buffet being set up was for someone else. My appetizer was actually lunch! Thanks God there was cheesecake on the way…

There was a session on unified storage, which mostly covered the LeftHand line. At one point, I asked if the data de-dupe was source based or destination based. The “engineer” looked like a deer in the headlights and promptly answered “It’s hash based.” ‘Nuff said… The session covering the G6 servers was OK, but “been there done that.”

Other than the cheesecake, the best part of the day was the final presentation. The last session covered the differences in the various blade servers from several manufacturers. Even though I work for a company that sells HP, EMC and Cisco gear, I believe that x64 servers, from a hardware perspective, are really generic for the most part. Many will argue why their choice is the best, but most people choose a brand based on relationships with their supplier, the manufacturer or the dreaded “preferred vendor” status.  Obviously, this was an HP – biased presentation, but some of the math the Bladesystem engineer (I forgot to get his name) presented really makes you think.

Lets start with a typical configuration for VMs. He mentioned that this was a “Gartner recommended” configuration for VMs, but I could not find anything about this anywhere on line. Even so, its a pretty fair portrayal of a typical VM.

Typical Virtual Machine Configuration:

  • 3-4 GB Memory
  • 300 Mbps I/O
    • 100 Mbps Ethernet (0.1Gb)
    • 200 Mbps Storage (0.2Gb)

Processor count was not discussed, but you will see that may not be a big deal since most processors are overpowered for todays applications (I said MOST). IOps is not a factor either in these comparisons, that would be a factor of the storage system.

So, let’s take a look at the typical server configuration. In this article, we are comparing blade servers. But this is even typical for a “2U” rack server. He called this an “eightieth percentile” server, meaning it will meet 80% of the requirements for a server.

Typical Server Configuration:

  • 2 Sockets
    • 4-6 cores per socket
  • 12 DIMM slots
  • 2 Hot-plug Drives
  • 2 Lan on Motherboard (LOM)
  • 2 Mezzanine Slots (Or PCI-e slots)

Now, say we take this typical server and load it with 4GB or 8GB DIMMs. This is not a real stretch of the imagination. It gives us 48GB of RAM. Now its time for some math:

Calculations for a server with 4GB DIMMs:

  • 48GB Total RAM ÷ 3GB Memory per VM = 16 VMs
  • 16 VMs ÷ 8 cores = 2 VMs per core
  • 16 VMs * 0.3Gb per VM = 4.8 Gb I/O needed (x2 for redundancy)
  • 16 VMs * 0.1Gb per VM = 1.6Gb Ethernet needed (x2 for redundancy)
  • 16 VMs * 0.2Gb per VM = 3.2Gb Storage needed (x2 for redundancy)

Calculations for a server with 8GB DIMMs:

  • 96GB Total RAM ÷ 3GB Memory per VM = 32 VMs
  • 32 VMs ÷ 8 cores = 4 VMs per core
  • 32 VMs * 0.3Gb per VM = 9.6Gb Ethernet needed (x2 for redundancy)
  • 32 VMs * 0.1Gb per VM = 3.2Gb Ethernet needed (x2 for redundancy)
  • 32 VMs * 0.2Gb per VM = 6.4Gb Storage needed (x2 for redundancy)

Are you with me so far? I see nothing wrong with any of these yet.

Now, we need to look at the different attributes of the blades:

2009-12-31_112613

* The IBM LS42 and HP BL490c Each have 2 internal non-hot plug drive slots

The “dings” against each:

  • Cisco B200M1 has no LOM and only 1 mezzanine slot
  • Cisco B250M1 has no LOM
  • Cisco chassis only has one pair of I/O modules
  • Cisco chassis only has four power supplies – may cause issues using 3-phase power
  • Dell M710 and M905 have only 1GbE LOMs (Allegedly, the chassis midplane connecting the LOMs cannot support 10GbE because they lack a “back drill.”)
  • IBM LS42 has only 1GbE LOMs
  • IBM chassis only has four power supplies – may cause issues using 3-phase power

Now, from here, the engineer made comparisons based on loading each blade with 4GB or 8GB DIMMs. Basically, some of the blades would not support a full complement of VMs based on a full load of DIMMS. What does this mean? Don’t rush out and buy blades loaded with DIMMs or your memory utilization could be lower than expected. What it really means is that you need to ASSESS your needs and DESIGN an infrastructure based on those needs. What I will do is give you a maximum VMs per blade and per chassis. It seems to me that it would make more sense to consider this in the design stage so that you can come up with some TCO numbers based on vendors. So, we will take a look at the maximum number of VMs for each blade based on total RAM capability and total I/O capability. The lower number becomes the total possible VMs per blade based on overall configuration. What I did here to simplify things was take the total possible RAM and subtract 6GB for hypervisor and overhead, then divide by 3 to come up with the amount of 3GB VMs I could host. I also took the size specs for each chassis and calulated the maximum possible chassis per rack and then calculated the number of VMs per rack. The number of chassis per rack does not account for top of rack switches. If these are needed, you may lose one chassis per rack most of the systems will allow for an end of row or core switching configuration.

Blade Calculations

One thing to remember is this is a quick calculation. It estimates the amount of RAM required for overhead and the hypervisor to be 6GB. It is by no means based on any calculations coming from a real assessment. The reason why the Cisco B250M1 blade is capped at 66 VMs is because of the amount of I/O it is capable of supporting. 20Gb redundant I/O ÷ 0.3 I/O per VM = 66 VMs.

I set out in this journey with the purpose of taking the ideas from an HP engineer and attempted as best as I could to be fair in my version of this presentation. I did not even know what the outcome would be, but I am pleased to find that HP blades offer the highest VM per rack numbers.

The final part of the HP presentation dealt with cooling and power comparisons. One thing that I was surprised to hear, but have not confirmed, is that the Cisco blades want to draw more air (in CFM) than one perforated tile will allow. I will not even get into the “CFM pre VM” or “Watt per VM” numbers, but they also favored HP blades.

Please, by all means challenge my numbers. But back them up with numbers yourself.

Cisco B200M1 Cisco B250M1 Dell M710 Dell M905 IBM LS42 HP BL460c HP BL490c HP BL685c
Max RAM 4GB DIMMs 48 192 72 96 64 48 72 128
Total VMs Possible 16 64 24 32 21 16 24 42
Max RAM 8GB DIMMs 96 384 144 192 128 96 144 256
Total VMs Possible 32 128 48 64 42 32 48 85
Max Total Redundant I/O 10 20 22 22 22 30 30 60
Total VMs Possible 33 66 72 73 73 100 100 200
Max VM per Blade (4GB DIMMs) 16 64 24 32 21 16 24 42
Max VM per Chassis (4GB DIMM) 128 256 192 256 147 256 384 336
Max VM per Blade (8GB DIMMs) 32 66 48 64 42 32 48 85
Max VM per Chassis (8GB DIMM) 256 264 384 512 294 512 768 680

vSphere 4.0 Quick Start Guide Released

The vSphere 4.0 Quick Start Guide: Shortcuts down the path of Virtualization has finally arrived!

I received a pre-release edition of the book at VMworld 2009. This guide has a great selection of shortcuts, tips and best practices for setting up and maintaining vSphere 4. I would be an excellent addition to any VMware administrator’s bookshelf. The book’s size also makes it a great reference for consultants as well. It will easily fit into your backpack.

It was authored by the following geniuses from the community:

Shows these guys some love and pick up a copy to support their efforts.

Changes to the ESX Service Console and ESX vs. ESXi…again

A whitpaper was posted in the VMTN communities Thursday outlining the differences between the ESX 3.x and ESX 4.x service console. It further offers resources for transitioning COS based apps and scripts to ESXi via the vSphere Management Assistant and the vSphere CLI. Also mentioned briefly was the vSphere PowerCLI. If you are a developer or write scripts for VMware environments, also check out the Communities Developer section.

I hear it time and time again…The full ESX console is going away. ESXi is the way to go. I know there are valid arguments for keeping ESX around, but they are few. Failing USB keys may be a valid argument, but I have not heard of this happening. If that is the case, use boot from SAN. You need SAN anyway. As for hung VM processes, there are a few ways to address this in ESXi.

If the techie wonks at VMware are publishing articles about how to transition to ESXi, then resistance is futile…you WILL be assimilated…

Setting up a Splunk Server to Monitor a VMware Environment

In a previous article, I compared syslog servers and decided to use Splunk. Splunk is easy to set up as a generic Syslog server, but it can be a pain in the ass getting the winders machines to send to it. There is a home brewed java based app on the Splunk repository of user submitted solutions, but I have heard complaints about its stability and decided that I was going to set out to find a different way to do it.

During my search, I discovered some decent (free!) agents on sourceforge. One will send event logs to a syslog server (SNARE) and one will send text based files to a syslog server (Epilog). Using the SNARE agents appear to be more stable than using the Java App and does a pretty good job. So I basically came up with a free way to set up a great Syslog server using Ubuntu Server, Splunk, SNARE and Epilog.

I created a “Proven Practice Guide” for VI:OPS and posted it there, but it seems that it is stuck in the approval process. I usually psot the doc on VI:OPS and then link to it in my blog post, and follow up later with a copy on our downloads area. To hurry things along, I also posted it in both places:

http://www.dailyhypervisor.com/?file_id=17

http://viops.vmware.com/home/docs/DOC-1563

VMTN: I/O Performance in vSphere, Block Sizes and Disk Alignment

Yes folks, it rears its ugly head again…Disk Alignment… If you have not read it yet, check out the whitepaper on disk alignment from VMware.

First, chethan from VMware posted a great thread on VMTN about I/O performance in vSphere. The start of the thread talks about I/O, then leads into anice discussion about block size. A couple of weeks ago, Duncan Epping posted a very informative article about block sizes. It convinced me to use 8MB blocks in VMFS designs.

Finally, the thread kicked into a discussion about disk alignment. As you know, the VMFS partitions created using the VI Client will aoutmatically be aligned. This is why I advocate NOT putting VMFS partitioning into a kcikstart script. The whitepaper demonstrates how to create aligned patrtitions on winders and Linux guests as well. The process is highly recommended for any intensive app. But I have always questioned the need to do this for system drives (C:) on guests. To do it requires a multi step process or the use of a tool, like mbrscan and mbralign, And I have wondered if it was worth the effort. Well, Jason Boche gave me a reason why it should be done across the board. And it makes sense: “This is an example of where the value of the savings is greater than the sum of all of its parts.”

Jas also outlined a very nice process for aligning Linux VMs and fixing a common Grub issue. Thanks for the tip Jas!

I should also thank everyone else involved: Chethan, Duncan and Gabe!

vSphere Install and Upgrade Best Practices KB Articles and Links

So, I use NewsGator to aggregate a BAZILLION feeds from several sources, blogs, like this one, actual news feeds and a bunch of VMware feeds. The VMware feeds are from the VI:OPS and VMTN forums. The VMTN forums allow you to create a custom feed by selecting the RSS link at the bottom right of each page or you can get a feed from a specific section of the forum by clicking the link on the bottom left of a list. On of the custom feed options is to get a feed of the new KB articles.

VMware has released quite a lot of new KB articles surrounding vSphere. They just released nice best practice guidelines for installing or upgrading to ESX 4 and vCenter 4. They are short and to the point. There is also a nice article covering best practices for upgrading an ESX 3.x virtual machine to ESX 4.0. One thing I noticed, but never thought about is this :

“Note: If you are using dynamic DNS, some Windows versions require ipconfig/reregister to be run.”

Eric Seibert over at vSphere-Land posted a nice set of “missing links” for everything vSphere. This is a nice, comprehensive set of links to evetrything you need for vSphere upgrades or installs.So, go check that out as well.

FLASH: ESX 4 Console OS is REALLY a VM this time!

While I was setting up ESX in text mode for my next blog post, I discovered that the installation sequence first creates a VMFS file system and then creates a VMDK file for the console OS. I confirmed it in the VIC. Here is a screen shot:

2009-04-29_174111

Click to enlarge image

I also noticed that the logs are now in a separate directory:

2009-04-29_174129Click to enlarge image.