Brokenhaze

The thoughts of a SysAdmin

Archive for the ‘DevOps’ Category

Change Control using Git(lab)

without comments

Pre-amble and thoughts on Change Control

If you just want the cool details you can skip to the good stuff

I am a strong believer in change control. It allows for many good things to be done with a well run IT organization. The top three things that come to mind are accountability, reliability and review-ability (I think i’m making that last word up).

There are all good things to have. Many people have come before me to praise change control and that is not what I want this post to be about. I want to talk about the change control process that we are starting to use at Stack Exchange. A process that I believe addresses some of the most common complaints and push back on implementing a change control system I hear.

A good place to start would be to lay out exactly what those common complaints are.

But, Change Control is too complicated!

I hear this a lot, and this particular complaint tends to break down into two different categories. The first is that there is so much process that you can’t get any actual WORK done. The second major complaint in the category is that it slows everything down, making it harder to get awesome things done.

I’m not going to argue with either of the characterizations. In fact the express purpose of a change control system is to put a speed bump in the way. To make you slow down a bit and think through the implications of what you are doing, to have someone else double check your work. We are not infallible, we make mistakes and that is ok. The goal here is to minimize the number of mistakes that make it into production and to minimize the times that they get there at all.

It’s just bureaucracy, busy work, an annoyance

I’ve met a good number of people that have this attitude. They want to be able to just log in as the administrative user and change whatever they want, whenever they want. Personally, I have given up on trying to convince these people that change control is a good thing.

Design a process that is a speed bump

I spent a good deal of time trying to come up with a better change control system. The system should be easy to use, low impact, and accessible to anyone on the team. One of my over-arching design goals is to simply create a speed bump not a road block.

I spent a good deal of time thinking about what tools we use day to day. Looking at the ones that people complained about, the ones people liked, and the ones nobody said anything about. After thinking for a while I realized that the one tool that everyone used, and there were few complaints about was our DVCS (recently moved to git). It’s just about a perfect fit for a light weight change control system.

The workflow

You don’t need anything special to get up and running with version control as the back to your change control system. The glue that brings everything together is a simple python script that does most of the heavy lifting for you. The script – called robocopy.py – takes the name of the system, and the risk of change and from that creates a new merge request from a template. The merge request is pretty light weight. We want all the detail to be in the actual commits and commit messages.

The basic workflow we use is to create a branch, then create a merge request based on that branch, have the change reviewed and then finally have the review merge the branch into master once the change is complete.

That’s it. One long sentence to define out entire change control workflow.

This is a very simple workflow that accomplishes all of the goals I have for a change control system. To make life even easier for people I wrote a small python script that creates the branch, copies some templates into place for you and then creates the merge request. All you have to do at this point is fill out the details and get someone to review it.

The future

This is just v1 of the our change control system. In the future we will be adding web hooks to automatically send out notification, add calendar entries. Basically the goal is to automate all of the boring manual stuff as much as we can.

You can get the code we use on GitHub

Written by George Beech

February 10th, 2016 at 5:50 pm

Updating VMWare Clusters without DRS

without comments

One of the pieces of the vSphere Enterprise license is DRS. Especially the ability to use DRS to one-click upgrade/update a cluster. If you don’t know what DRS is, the short version is that it is a product you get with the Enterprise license which allows you to have vSphere move VMs around intelligently. One of the added bonuses you get is the ability to evacuate a VM. When you combine that with vSphere Update Manager you get a one-click and an hour later you’re done upgrade of your cluster.

Unfortunately, when that is the one feature you would actually use in the Enterprise edition it doesn’t make financial sense to pay that premium. The question now is “What do you do to make your life easier than manually moving thing?”

The answer is you go and grab Power CLI and write a script! I’ve got one started — Put github link here when done — and I’ll go through some of the details of it here.

First, what are the things that it can do?

  • Migrate running VMs to the other hosts in the cluster
  • Enter and Exit Maintenance mode
  • Move the VMs from where they went, back to the host that was drained

Next, what still needs to be added?

  • Intelligent migrations (it just blasts the vms around blindly now)
  • automatically roll through the whole cluster
  • Everything I haven’t thought of …

The most interesting function here is the evac-host function. This is actually the meat and bones of this script.

function evac-host()
{
    Param(
        [Parameter(Mandatory=$true)]
        [string] $ClusterName,
        [Parameter(Mandatory=$true)]
        [string] $vHost
    )

    $AliveHosts = Get-VMHost -Location $ClusterName

    $toHosts = @()

    foreach ($h in $AliveHosts)
    {
        if($h.Name -ne $vHost)
        {

            $toHosts += $h.Name
        }
    }

    $svr = 0
    $vms = get-vm -Location $vHost | where {$_.PowerState -ne "PoweredOff"}
    $m_loc = @{}

    if ($vms.Count -gt 0)
    {

        foreach ($v in $vms)
        {
            Move-VM -vm $v.Name -Destination $toHosts[$svr]
            $m_loc.Add($v.Name, $toHosts[$svr])
    q
            if($svr -ne $toHosts.Length -1 )
            {
                $svr++
            }
            else
            {
                $svr = 0
            }

        }

        $m_loc.GetEnumerator() | Sort-Object Name | export-csv C:\Users\gbeech.STACKEXCHANGE\locations.csv
        $m_loc

    }
    else
    {
        write-host "No Powered on VMs on the Host"
    }
}

The 10,000 foot view is that this function takes a Cluster Name (wild cards are acceptable), and a ESX host name as arguments. Then it goes through and moves every vm off the host. As it does this, it writes the names and locations they got sent to to a Hashtable, then writes those results out to a csv just in case. It also returns the hashtable so we can work with it later, avoiding having to read the csv back in.

Written by George Beech

June 16th, 2014 at 7:41 pm

Fun With PowerShell, WS-MAN, and Dell Servers

with 8 comments

Recently I’ve been playing with using the WS-MAN protocol to gather information (and eventually run updates) on our Dell servers. It has actually been a fairly insteresting project after I got through the pretty high learning curve to get started using WS-MAN.

First, what is WS-MAN? It’s a management standard developed by the DTMF. What it really boils down to is giving us the ability to access and manipulate CIM providers via HTTP calls.

One of the interesting things Dell did with their systems in the past two generations (Gen 11 and 12) is to add something they call the Life Cycle controller. They did not really make much information known on what you can do with it, or even how to really use it.

Recently I have been exploring what you can do with the Life Cycle Controller. And, quite honestly, you can do a ton of good stuff with it. Everything from getting system information to setting boot options, all the way up to updating all of the firmware on your box. This is all done through the WS-MAN Protocol.

First I would suggest doing some reading so you can get the basic concepts of WS-MAN.

Phew, got through all that?

Lets start off with a nice code snippet that I have been working on, and then step through what it is doing.

$DELL_IDS = @{
    "20137" = "DRAC";
    "18980" = "LCC";
    "25227" = "DRAC";
    "28897" = "LCC";
    "159" = "BIOS"
    }

$pass = ConvertTo-SecureString "ThisIsMyPassword" -AsPlainText -Force
$creds = new-object System.Management.Automation.PSCredential ("root", $pass)
$wsSession = New-WSManSessionOption -SkipCACheck -SkipCNCheck

$svc_details = @{}

$base_subnet = "192.168.99."
$addrs = @(1..254)
foreach ($ip in $addrs)
{
    $base_subnet + $ip
    $s = [System.Net.Dns]::GetHostByAddress($base_subnet+$ip).HostName
<code>$fw_info = Get-WSManInstance 'cimv2/root/dcim/DCIM_SoftwareIdentity' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds
$svr_info = Get-WSManInstance 'http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_SystemView' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds

$svc_details.Add($s, @{})
if($svr_info -eq $null)
{
    $svc_details[$s].Add(&quot;Generation&quot;, &quot;unknown probably 11G&quot;)
}
else
{
    $svc_details[$s].Add(&quot;Generation&quot;, $svr_info.SystemGeneration.Split(&quot; &quot;)[0])
}
foreach ($com in $fw_info)
{

    $DELL_IDS.ContainsKey($com.ComponentID)
    if($DELL_IDS.ContainsKey($com.ComponentID))
    {
        #need to see if I can update this to account for the different
        #way drac6 and 7's format this string
        $inst_state = $com.InstanceID.Split(&quot;#&quot;)[0].Split(&quot;:&quot;)[1]
        if (($inst_state -ne &quot;PREVIOUS&quot;) -AND ($inst_state -ne &quot;AVAILABLE&quot;))
        {
            $svc_details[$s].Add($DELL_IDS[$com.ComponentID], $com.VersionString)
        }
    }
}

}

The first part of this code is simply a hash table of dell component IDs and an associated easy-to-remember name matching them with the component. How did I get those? Well I queried the cimv2/root/dcim/DCIM_SoftwareIdentity namespace and parsed the output by hand to grab those IDs. They match up to BIOS, LCC v1, LCC v2, iDRAC 6 and iDRAC 7.

$pass = ConvertTo-SecureString "ThisIsMyPassword" -AsPlainText -Force
$creds = new-object System.Management.Automation.PSCredential ("root", $pass)
$wsSession = New-WSManSessionOption -SkipCACheck -SkipCNCheck

This next section of code sets up our enviroment for Get-WSManInstance. First we need to convert our plaintext password into a secure string, then create a PSCredential object to use later so we don’t have to enter our username and password over and over. Finally, we setup a new WS-MAN session options object so that it doesn’t error out on the self signed certificates we are using. If you are using fully trusted certificates on your dracs you can skip this step and not specify the -SessionOption $wsSession flag later.

$fw_info = Get-WSManInstance 'cimv2/root/dcim/DCIM_SoftwareIdentity' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds
$svr_info = Get-WSManInstance 'http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/root/dcim/DCIM_SystemView' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds

Note You can specify either the DCIM Path or the schema, I’m showing both ways here. For the $svr_info variable 'cimv2/root/dcim/DCIM_SystemView' would also work.

Now, we move on to the meat of what we are doing. These two lines grab the system information that we want to parse. The $fw_info contains an XML object that returns all of the install components as exposed by the DCIM_SoftwareIdentity endpoint, and the $svr_info variable contains an XML object that has some interesting system information – such as Server Generation, Express Service Code, Service Tag, and so on. I use these two pieces of information to parse out the Generation, DRAC, BIOS, and LCC firmware versions.

#need to see if I can update this to account for the different</h1>
#way drac6 and 7's format this string</h1>

$inst_state = $com.InstanceID.Split("#")[0].Split(":")[1]
if (($inst_state -ne "PREVIOUS") -AND ($inst_state -ne "AVAILABLE"))
    {
        svc_details[$s].Add($DELL_IDS[$com.ComponentID], $com.VersionString)
    }

One last tricky bit. When you get back the versions that are installed, you will actually have two different versions. Once that is the active version and one that is the rollback version. Unfortunately you need to parse string to figure that out. And different DRACs use different string formats.

  • Drac6: DCIM:INSTALLED:PCI:14E4:1639:0236:1028:5.0.13
  • Drac7: DCIM:INSTALLED#802__DriverPack.Embedded.1:LC.Embedded.1

Once I have this information in my two-dimensional array I can create reports and manipulate the information to tell me exactly what version each of my servers is at.

Sweet! Step one to automating the update of our firmware complete! Next up figure out how to automate the deployment and installation of new firmware.

Written by George Beech

February 26th, 2014 at 10:17 pm