Category Archives: Security

Big vCloud Director Security Gotchas That I Have Found

This post includes an important security “gotcha” that I recently uncovered with vCloud Director 1.5 running on vSphere 5. If you are using vCloud Director, you should check your settings.

The BIG Security Issue

Read more »

vShield Zones – Some Serious Gotchas

OK..I’ll admit it: I am spoiled by the capabilities of vSphere. What other platform lets you schedule system updates that will occur unattended and without outages of the applications being used? I don’t mean the winders patches, they require a monthly reboot. I am talking about the hypervisor updates. VMware Update Manager coordinates all of this for you. Then along comes vShield Zones to break it all.

First, let me explain what I am trying to do. To simplify things, vShield Zones is a firewall for vSphere Virtual Machines. Rather than regurgitate how it works, take a look at Rodney’s excellent post. A customer has decided to use vShield Zones to help with PCI Compliance. The desire is that only certain VMs will be allowed to communicate with certain other VMs using specific network ports, and to audit that traffic. ’nuff said.

vShield Zones seems to be the perfect solution for this. It works almost seamlessly with vCenter and the underlying ESXi hosts. It provides hardened Linux Virtual Appliances (vShield Agents) to provide the firewalling. It provides a fairly nice management interface to create the firewall rules and distribute them to the vShield Agents. Best of all, IT’S FREE! At least for vSphere Advanced versions and above. Keep in mind, that this is still considered a 1.x release and some things need to be worked out.

Now, on to the gotchas.

Gotcha #1 – Networking

When it comes to networking, the vShield Agent is designed to sit between a vSwitch that is externally connected via physical NICs (pNICs) and a vSwitch that is isolated from the outside world. The vShield Agent installation wizard will prompt you to select a vSwitch to protect. This is illustrated below. The red line indicates network traffic flow.

Click the Image to Enlarge

Click the Image to Enlarge

This works like a champ in this configuration, using a vSwitch for management, which is naturally on an isolated network to begin with, using a vSwitch for VMs to connect to the vShield Agent and using a vSwitch to connect everything to the outside world.  This can also be deployed with limited down time. If you are lucky enough to have the Enterprise Plus version, you may want to use a vNetwork Distributed Switch or even a Cisco 1000v. You will need to make some manual configurations to make this work as outlined in the admin guide.

The gotcha is with blade servers or “pizza box” servers that have limited I/O slots. If all of the VM traffic must flow through the same physical NICs and you use a vSwitch, then you need the vShield Agent to protect a port group rather than an entire vSwitch. You will need to create a vSwitch with a protected port group and connect it to the pNICs. Then you you can install the vShield Agent. Once the vShield Agent is installed, you will need to go back to the vSwitch attached to the pNICs and add an unprotected port group. This is illustrated below. The red line is the protected traffic and the blue line is the unprotected traffic.

Click on Image to Enlarge

Click on Image to Enlarge

As you can see, there is an unprotected Port Group (ORIGINAL Network). This needs to be added to the vSwitch AFTER the vShield Agent is installed. If the ORIGINAL Network is already a part of the vSwitch, it will need to be removed BEFORE installing the vShield Agent. In order to avoid an outage, you will need to disable DRS and manually vMotion all VMs off of the ESX/ESXi host before installing the vShield Agent and modifying the port groups.

Gotcha #2 – DRS/HA Settings

The vShield Agents attach to isolated vSwitches with no pNIC connection. As you should already know, using DRS and vMotion on an isolated vSwitch could cause inter-connectivity between VMs to fail. By default, you cannot vMotion a VM that is attached to an isolated vSwitch. You will need to enable this by editing the vpxd.cfg file. You will also need to disable HA and DRS for the vShield Agents so they stay on the hosts where they are  installed. Both are well documented. Obviously, you will need to install a vShield Agent on every ESX/ESXi host in the cluster.

The Gotcha here is that, with HA disabled for the vShield Agent, there is no facility for automatic startup. There is an automatic startup setting in the startup/shutdown section of the configuration settings. First, this is an all-or-nothing setting. Second, according to the Availability Guide:

“NOTE The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled for all virtual machines residing on hosts that are in (or moved into) a VMware HA cluster. VMware recommends that you do not manually re-enable this setting for any of the virtual machines. Doing so could interfere with the actions of cluster features such as VMware HA or Fault Tolerance.”

So, if a host fails, HA will restart all protected VMs on different hosts. If the host comes back on line, you risk having DRS migrate protected VMs back to that host. This will cause those VMs to become disconnected because the vShield Agent will not automatically start. If a host fails, hope that it fails good enough so it won’t restart.

Gotcha #3 – Maintenance Mode

At the beginning of this post, I mentioned how VMware Update Manager has spoiled me. VUM can be scheduled to patch VMs and hosts. When host patching is scheduled, VUM will place one host in Maintenance Mode, which will evacuate all VMs. Then, it will apply whatever patches are scheduled to be applied, reboot and then exit Maintenance Mode. It will repeat this for each host in a cluster. This works great unless there are running VMs that have DRS disabled, like the vShield Agent.

In the test environment, when a host was manually set to enter Maintenance Mode, it would stall at 2% without moving the test VMs. I am not sure the order that VMs are migrated off, but none were migrated in the test environment. This could vary in different installations. Here’s the gotcha: you cannot power the vShield Agent off because the protected VMs would become disconnected. You cannot migrate it to a different host because it would cause a serious conflict and cause protected VMs to become disconnected. The only thing you can do is place the host in Maintenance Mode, then MANUALLY (*GASP*) migrate all of the protected VMs and then power the vShield Agent off. So much for automated patch management. We’re back to the “oughts.”


I said already that vShield Zones is a 1.x product. It’s a great firewall, but it has a few gotchas that you need to consider. The benefits may outweigh the negatives. But vSphere is a 4.0 product.Some of this should be able to be addressed by tweaking vCenter or host settings.

vShield Zones should be smart enough to allow us to select specific port groups to protect rather than an entire vSwitch. I guess whatever scripting is being done in the background will need to be changed for this. Maybe we need a Ghetto vShield?

One of the REALLY smart people at VMware should be able to tell us the “order of migration” when a host is placed in Maintenance Mode. Once that is determined, there is probably a configuration file somewhere that we could tweak to change it.

There should be a way to set up automatic startup and shutdown of individual VMs. The Startup/Shutdown settings sort of deprecated once DRS was introduced. The only time it is useful is with a stand-alone server or in a NON-DRS cluster. I guess the only thing that could be done is to add a script somewhere in rc.d or rc.local to start up these VMs, but how can that be done in a “supported” fashion with ESXi and is it supported in either ESX or ESXi?

I brought these issues up with some VMware engineers and they assure me that they are working on this. Hopefully they will figure it out soon. I hate doing things manually. It seems like it is anti-cloud.

Setting up a Splunk Server to Monitor a VMware Environment

In a previous article, I compared syslog servers and decided to use Splunk. Splunk is easy to set up as a generic Syslog server, but it can be a pain in the ass getting the winders machines to send to it. There is a home brewed java based app on the Splunk repository of user submitted solutions, but I have heard complaints about its stability and decided that I was going to set out to find a different way to do it.

During my search, I discovered some decent (free!) agents on sourceforge. One will send event logs to a syslog server (SNARE) and one will send text based files to a syslog server (Epilog). Using the SNARE agents appear to be more stable than using the Java App and does a pretty good job. So I basically came up with a free way to set up a great Syslog server using Ubuntu Server, Splunk, SNARE and Epilog.

I created a “Proven Practice Guide” for VI:OPS and posted it there, but it seems that it is stuck in the approval process. I usually psot the doc on VI:OPS and then link to it in my blog post, and follow up later with a copy on our downloads area. To hurry things along, I also posted it in both places:

Dear VMware: Pick a Common (SUPPORTED) Virtual Appliance OS…Please….

One of my pet peeves is that each virtual appliance coming out of VMware is that each different virtual appliance released by them is based on a different OS. Some of these do not even have documented methods for updating the OS. We all know that no matter what OS is running on a system, there will be updates for stability and security. Almost every time I begin an engagement with a customer and it involves using a virtual appliance, their security wonks get all pissy with me and I need to show that I have the latest security patches installed before I even connect the appliance to their network.

This all started with the HealthCheck Appliance, which is a tool available to partners. Its running Ubuntu 7.10 Server JEOS. Great! It is an unsupported, deprecated OS. If you know anything about Ubuntu, you know that the “Long Term Support” (LTS) versions are released every other year. So, the latest LTS version is 8.04 and the previous is 6.06. No big deal, right?

Now to further complicate things, the VMware Data Recovery appliance and the vSphere Management Assistant run completely different OS versions. The VDR runs CentOS with a kernel version 2.6.18-92-el5. The vMA runs RedHat Enterprise Linux with a kernel version 2.6.18-128.1.1.el5.

Updating the vHA

The documentation that comes with the vHA explains how to update the OS using apt-get, and it explains it in such a way that anyone can do it. BUT…Ubuntu 7.10 has been deprecated and the repositories were recently removed. Running apt-get update results in a bunch of http 404 errors because the repositories are no longer where the OS thinks they belong. Now what?

I did a quick search on google and found a blog post on NewAdventuresInSoftware about a work around, so thanks goes out to Dan Dyer for providing this solution. Its pretty straight forward and I added how to install VMware Tools to it. The 7.10 repositories haven’t been completely removed, they were just moved to a different url: The apt-get utility uses the file “/etc/apt/sources.list” to determine where to go for patches and software packages. In order to upgrade the OS to 8.04, you need to install the update-manager-core first and then upgrade the OS. So, you need to point apt-get to the new url to install the update-manager-core package and all the dependancies. But, before upgrading the OS, you need to point back to the original repositories, because that is where the 8.04 packages reside. Here is a step-by-step list of how to get this done:

1. Make a backup copy of the /etc/apt/sources.list file:

Edit the original file:

Comment out any reference to a CDROM using #

Use the global search and replace command in vi to change the references:

Save and quit:

Install the update-manager-core package and all dependancies:

Copy the original sources.list file back because it gets changed during the upgrade:

Edit the original sources.list:

Comment out any reference to a CDROM using #

Save and quit:

Run the OS upgrade routine:

Install the packages required for VMware Tools to be compiled:

Install VMware Tools as listed in the instructions (Replace the ?.?.?-?????? with the proper tools version)

You should be prompted to automatically run

Once the installation is completed, install the vmxnet drivers:

sudo modprobe vmxnet
sudo /etc/init.d/networking start

Verify your IP address:

Updating the vMA

The vMA has a well documented process for updates, using sudo vima-update scan and sudo vima-update update for updating the OS. I am assuming that eventually, patches will become available from VMware, but there is nothing right now. The vima-update utility can also be configured to look at a different repostory for patches. That is documented in the Admin Guide and I won’t get into it here. The is nothing about updating VMware Tools, but a simple VMware Tools process for RPM based distros will work. Just copy the VMwareTools…rpm from the tools cd image and run rpm -i VMwareTools…rpm. Substitute the proper file version.

Updating VDR

HA! Nothing is documented for VDR about updates. Nothing. Not even a mention. It is running an older, unpatches kernel and an old version of VMware Tools. I found a post on the communities about how to update the OS using yum update and VMware Tools. Basically, is hard-coded to to older versions of libssl and libcrypto, so symlinks need to be added to install VMware Tools properly:


Since some of these methods are NOT documented by VMware, they may not be supported. Sometimes, you have to weigh security concerns against ultimate support ooptions.

SPLUNK! Goes the Syslog Server…

The use of a “syslog” server is important in today’s data center. Most network and SAN switches, along with Unix and Linux servers are capable of sending logging information to a syslog server. The obvious reason for a syslog server is to centralize all of your logs. This enables you to troubleshoot issues more efficiently. Most syslog servers allow you to do a time-line based analysis of log data so that you have an enterprise – wide view of all activity. This allows you to see how different devices interact.

An less obvious reason for a syslog server is for security purposes. The theory is that an attacker will attempt to elevate to root privileges and then try to delete or alter logs to hide evidence of the attack. If all log information is relayed to a syslog server, the hope is that this data is secured for forensic study, if needed.

I have tried a few different “free” and non-free syslog servers. I didn’t do extensive research into all available syslog servers, but I have to say that I like Splunk the best. It starts with a free server with a limited amount of data. This may be fine for smaller shops. There is also a paid version that allows for more data collection. The fully “free” syslog server that came close was the combination of syslogd and phplogcon on a Linux server. I also tried Kiwi syslog, which also has a “free” version and a paid version. But it only installs on winders. Most of the syslog servers are great. There were a few capabilities I felt made Splunk a nice syslog server:

  • Act as a standard syslog server.
  • The ability to “scrape” directories.
  • Monitor Windows logs.
  • Allow for upload of log data.
  • Provide Time line Analysis.

Acting as a standard syslog server is really a no-brainer. All of the packages that I tested worked fine in this respect. You set up pointers to the syslog server in the *nix /etc/syslog.conf file and all logs are automatically sent.

When dealing with collecting logs on an ESX server, the standard syslog.conf settings may not cut it. The HA logs reside in a different location and should be “scraped”. In this context, “scraping” is the process of reading all of the text files in a specified directory and compiling them into the syslog database.

Monitoring Windows logs is also a key ingredient in the datacenter stew. If you are going to do centralized collection of logs, collect everything. Splunk uses WMI to gather this information.

The ability to upload log data manually is also a nice option. I was recently troubleshooting an issue with VMware Consolidated Backup and I was able to manually upload all of the related VCB logs right into a Splunk server VM. I exported the Windows system and application logs to .csv files and copied them to a directory on the Splunk server. I also copied the VCB logs and ESX logs to the same directory. After a few minutes, the data was assimilated into the database and ready for analysis. I was able to look at a specific point in time and look at errors across the entire environment. I could see errors in the VCB logs and relate them to errors in the Windows system and application logs. I was also able to track all of the ESX and VM logs for the time period.

The Splunk server offers WAY more than the logging functions described here. It is also a great tool for compliance, change control, security, server management, etc. It has install packages for winders, Linux, Solaris (x86, x64 AND Sparc), Mac OSX, FreeBSD and AIX.

As you can see, the Splunk server is very useful for capturing all kinds of logs for security and troubleshooting purposes. In part two, I will dig deeper into setting up a Splunk server and configuring *nix, ESX, ESXi and winders machines to send their logs. As with the VCB Proven Practice Guide, there will be a companion doc on the VI:OPS site.

Ken's Networking Tips…and Cracking the ESX Root Password

First…Ken Cline started a blog about a month ago. It has some nice tips for networking, so check it out. My eyes were opened after reading his post about accepting default settings. I know the post is almost a month old, but I have that reading narclepsy thing. It is still a very important thing to read. My philosophy is similar to Ken’s:

Just because you CAN do something does not mean you SHOULD do it.

I guess I picked up the habit of creating a slightly complicated way of connection two pNICs to a vSwitch for redundant connections for the Management Network and VMotion Network. I think this came from the old school (ESX 2.x) way of doing things. Ken’s methods are far easier to set up and manage.

Now…on to cracking the root password on an ESX server….

VMware just released a KB article about how to change a “forgotten” root password. WOW. Pretty simple.. Anyone who knows a smidgen about Linux could have helped here. It was even a part of the RedHat test back in the day when I took it.

First off, there should be no reason to need this if proper security and change control practices are followed. Passwords should be changed regularly and they should be kept somewhere. That somewhere should be secure. ‘Nuff said.

Second…And this is the biggie…. If you follow security best practices, this method will be rendered USELESS. The boot loader (Grub in this case) should be secured properly to protect from this sort of attack. Yes, I used the “A” word here. Unauthorized entry into single user mode is an attack. If Grub is secured properly, you will need to know a password to enter the append or edit modes. This password is just as important as the root password and proper security and change control practices should also be followed here.

OK…Now your server boot loader is “secure”.  The root password is fracked and you don’t know the grub password… Now what? Enter the rescue CD. A live Linux CD, like DSL or Knoppix will allow you to boot into a Linux session and mount your boot partition and edit the grub.conf file. Now, you can boot the server and append the boot loader line to exit single user mode. Or, you can mount the root partition, chroot and change the root password that way. It is less than simple, but we are talking about “attacking” a server and changing the root password. How do I prevent the “Live CD Attack”? Use a BIOS password and set the server so it does not boot from external media. This password should also follow the security and change control practices.

Next..”cracking” the BIOS password. In a few words: DIP switch 5. Most servers have a bank of DIP switches. Flip one of them and the BIOS password is disabled. How do you avoid this? Lock the server. I have and HP key that opens any HP cabinet and a few Dell keys that do the same thing….

I am going to stop now. I am not a computer security expert by any means. I just use common sense. Three thousand years ago, I worked for an alarm company. I could get into any “secured’ alarm cabinet and shut them down. My philosophy back then was:

Locks are for honest people…

Back to the original intent of the password thing. Use common sense. The method VMware posts for resetting a lost root password should not be possible. It all falls down to common sense and the use of good security and change control best practices. Many attacks come from within. Locks and security measures will only slow someone down and hopefully trigger that alarm system to notify someone of the attack.

Virtualization and Security

Security is huge when it comes to virtualization. The extra moving parts require a special care and feeding.  The Defense Information Services Agency is basically the IT department for the US Defence Department. They have an arm, called the Information Assurance Support Environment. The IASE is a has some serious information about securing any system. They post Security Technical Implementation Guides (STIGS) and Security Checklists that are very comprehensive. They even have STIGs and Checklists for all the different versions of winders. Some of the information is specific to the DoD, but those things, like certificates, etc. still have a place in any IT shop. I subscribe to their newsletter, so they just came to mind again because they posted a Draft XenApp STIG. I glanced at the docs, but they look pretty deep and I have reading narcolepsy…

So, why do I bring this up? They also posted a STIG for ESX Server a while ago and recently posted an updated Security Checklist for ESX. I know that Sid used these as a guide for his kickstart / post installation script. When coupled with the Unix STIG and Checklist, you will get a very secure system. So go check them out. They a free and that is my favorite price. So go get some.