Category Archives: 6.x

Considerations for Capacity Management with vROps

Navigating your way around capacity management is not and easy task, especially at a large company where it seems almost impossible to get your arms wrapped around it. HA – I picture a large tree and trying to hug it, not quite able to lock your fingers on the other side! It’s really kind of like that. You got most of it, but you are always reaching. At times you need to step back and re-evaluate your angle or approach. Over the last year or so I’ve been working with the capacity management team to choose exactly the right metrics to determine the best way to evaluate capacity. Last week one cluster, according to vROPs, was in desperate need of capacity, we were running into our buffers; however when we looked closely in our review meeting we noticed that the reason we were out of capacity was due to CPU Demand. This spun off a number of weekly meetings to re consider our approach or angle to see if we can get our fingers locked. In all honesty, this wasn’t an oversight, we have a pretty smart group of people and we meet regularly to review. Everyone on our team has the same goal and these types of discussions make sure we are staying on target; however we did realize that we needed a deeper understanding of the different types of capacity models and how to apply them as policies across the virtual infrastructure. So let’s start with a quick level set and go from there. All right, here we go!

Allocation Model
This model is capacity based on the configured amount of resources assigned to a VM or VMs in a cluster. The consensus is that this model should be used for production environments where you have important workloads, and you want to be able to keep resources for fail-over, and you want to make sure you don’t over commit by too much. You decide your over commitment ratio and set that in the policy. This is the most conservative capacity model.

Demand Model
The Demand model is often used in Test/Development environments where you don’t necessarily care about over allocation, and you really want to get as many guests as possible in the environment. If you are using this model you probably don’t care if the hosts are running hot. You will likely be way over allocated but again you don’t care because you want to run this for highest possible VM density.

Memory Consumed model
This model allows you to see the memory resources used just like you would in the vSphere client. It shows the active memory, plus shared memory pages, plus recently touched memory. All the memory overhead.

So which one do we choose? That’s an excellent question. In all likely hood, we are going to look at all these models and how they affect capacity. We have, and I’m guessing you do too, clusters with mixed workloads or due to licensing considerations clusters where you have to mix test/dev hosts with production hosts. So its not so easy to just pick one or the other and go with it, especially when you have to scale up the environment to meet the needs of the company. Our team decided to start to implement different policies specific to the cluster and workloads in those clusters. The polices will include different allocation over-commit ratios for CPU/Memory and Disk. Some policies will account for all three models others will just be one or a combination. What’s really great is vRealize Operations is so flexible its really easy to dial in capacity just the way you want it. One other decision we made that you might want to consider is that we will only rely on the data in vROPs for capacity management. We wont look at what vCenter is showing for cluster resources used to determine if we can “fit” more VMs in. Capacity management is not easy, it takes time to collect metric data, analyze it and then tweak it so you are sure you can make the best decisions. Sometimes those decisions can save (or cost) your company a significant amount of money. The good news is there is no magic going on there. If you put in the work and use a great tool like vRealize Operation Manager you will get to a point where real value will be realized with vROPs. Now that our team has determined to use a combination of models, we can then begin to adjust policies and review data that’s already been collected to make sure we are using metrics that meet our needs. I’d love to hear how others are using vROPs to determine capacity and some of the challenges and success you have encountered. If you read this and want to share, add a comment.

I’d like to thank Hicham Mourad for his help with some questions and his guidance along the way. He is a really smart guy, and Im thankful I can reach out to him when I need to. 🙂

Advertisements

Manually Increasing vSphere Web Client Heap Size

The other day when I was building a vSphere 6.0 environment up in my lab for testing I ran into an issue where performance was extremely slow in the web client and I was continually receiving an error that the VMware-dataservice-sca and vsphere-client status would change from green to yellow.  When I deployed the VCSA/PSC appliance I choose “Tiny” as the size option.  Even though my implementation is going to be under the 10 hosts and 100 VMs, I think this build was just not enough, and performance in the web client was just really lacking.  Searching the VMware KB I came across 2144950.  I found out this is a known issue affecting vCenter Server 6.0.  Here are the steps that I used to work around the error and gain performance back in the web client.

First I added additional RAM to the appliance.  Pretty straight forward, no magic there.  Then I used SSH to connect to the appliance and ran the follow command:

cloudvm-ram-size -C XXX vsphere-client

Replace the XXX with the size in MB that you want to increase the heap size.

If you are running a Windows  vCenter Server, find C:\ProgramFiles\VMware\vCenter Server\visl-integration\usr\sbin\cloudvm-ram-size.bat and run this command:

cloudvm-ram-size.bat -C XXX vspherewebclientsvc

Again swap out the XXX with the size in MB that you want to increase the heap size. Don’t forget to restart the vSphere client service.

Removing a PSC or vCenter Server in vSphere 6.x

The other day I’m bring up another vSphere 6.0 environment for our VDI team in our engineering test lab and for some reason im having all sorts of issues.  I’m installing a VCSA with embedded PSC and connecting it to and existing SSO domain.  I have no idea whats going on, its going horrible. One time the install will fail, then the next it will complete, but enhanced linked mode is just acting weird….Well unbeknownst to me the QIP team decided to cut over DNS to new appliances and that was reeking havoc across the environment.  So now that I’ve killed (I kid) the guy who was doing this I’m left with a mess to clean up.  Finally DNS is working properly so I’m going to re-deploy the PSC/VCSA again but before I do that, I have to clean up the one that I don’t want anymore. Lucky for us its a pretty easy job.

The first step I had to do was make sure that my appliance was powered down.  I knew that there was no other VCSA pointing to this PSC.  If you are unsure if any other vCenter is connected to the PSC you are removing, you can check by logging into the vSphere web client and go to the advanced vCenter server settings and look for a property called config.vpxd.sso.admin.url and the value of this setting is the PSC the vCenter server is using.  If you find any other vCenters VMware has KB 2113917 to help you re point your vCenter to a different PSC.

Once that is all sorted out, next we need to connect to another PSC is the same SSO domain via SSH and run the following command:

cmsso-util unregister --node-pnid Platform_Services_Controller_FQDN 
--username administrator@your_domain_name --passwd vCenter_Single_Sign_On_password

After that completes, delete the appliance from your inventory and check in Administration -> System Configration -> Nodes to make sure that its not listed there.

Removing a VCSA is just about the same as above just have to make one change in the command:

cmsso-util unregister --node-pnid vCenterServer_System_Name --username 
administrator@your_domain_name --passwd vCenter_Single_Sign_On_password

If you need some additional info on these steps, check out KB 2106736

Removing a Solution from vRealize Operations Manager 6.x

The great thing about having a lab environment is I get to test out a number of solutions for vROps.  One that I have been evaluating is the Cisco UCS Management Pack for vROps.   We started with a beta version for vROps 5.x then updated the pack for vROps 6.1 and now I wanted to install the newest version of the pack for 6.2.  One problem, the old solutions are just kinda stuck in there.  They don’t update and when you think you are removing the solution, you are really just deleting the adapter settings.  In this article ill go through the steps on how to remove the old management packs and get everything clean and ready for the new version.

***Caution!*** – we are going to be editing some sensitive files so you really should open up a service request with VMware support if you are doing this in production.  I’m working in my lab environment  so if things went FUBAR, its not a big deal.   I will have to eventually do this in production, (well not me, the operations team) and so far I haven’t had any issues going through this, but ya never know.  Open up an SR before touching your production vROps.  This way in an event of an issue or a mistake VMware support can help guide you through fixing it. Okay, enough of that.

The fist step is to log into the vROps node that has the incompatible solution. In my lab I only have one node, so that’s pretty straight forward.

navigate to /storage/db/pakRepoLocal/  to determine which solution you want to uninstall.  I have a couple different UCS solutions installed.

Run this command to determine the actual adapter name and take note of the “Name” field.
cat /storage/db/pakRepoLocal/Adapter_Folder/manifest.txt

The next step to uninstall the solution pack is to change to /usr/lib/vmwarevcops/tools/opscli/   and run the opscli.sh with the uninstall option
./opscli.sh solution uninstall “Name_of_the_pak”

Once the process has completed you will see a return that states the uninstall has been successful. Like in the example below.

After the above step is complete, run this for some additional clean up.
$VMWARE_PYTHON_BIN $ALIVE_BASE/../vmwarevcopssuite/utilities/pakManager/bin/vcopsPakManager.py –action
cleanup –remove_pak –pak Name_of_Pak     (replace Name_of_Pak with the name from above)

Next you will have to remove the solutions .pak file from the .pak files directory.
Go to $STORAGE/db/casa/pak/dist_pak_files/VA_LINUX  and rm the pack file name.

Now open /storage/db/pakRepoLocal/vcopsPakManagerCommonHistory.json in a text editor and delete all entries related to the removed solution from { to }   Don’t forget to save it!

Lastly go back to the /storage/db/pakRepoLocal/  directory and remove the sub directories, files and parent directory for the solution you removed.  Use the rm and rmdir commands.   You may also have to delete any dashboards that were installed with the solutions pack from dashboards in the vRealize Operations Manager UI.  Also note that in order for the changes to take effect, you will need to log out and back into the UI.

Take your time running through the steps and you will see its not all that difficult.  I’ve also used this process when a solution doesn’t install successfully before I try and reinstall it and remember to take caution when doing this in a production environment.