The thoughts of a SysAdmin

Change Control using Git(lab)

without comments

Pre-amble and thoughts on Change Control

If you just want the cool details you can skip to the good stuff

I am a strong believer in change control. It allows for many good things to be done with a well run IT organization. The top three things that come to mind are accountability, reliability and review-ability (I think i’m making that last word up).

There are all good things to have. Many people have come before me to praise change control and that is not what I want this post to be about. I want to talk about the change control process that we are starting to use at Stack Exchange. A process that I believe addresses some of the most common complaints and push back on implementing a change control system I hear.

A good place to start would be to lay out exactly what those common complaints are.

But, Change Control is too complicated!

I hear this a lot, and this particular complaint tends to break down into two different categories. The first is that there is so much process that you can’t get any actual WORK done. The second major complaint in the category is that it slows everything down, making it harder to get awesome things done.

I’m not going to argue with either of the characterizations. In fact the express purpose of a change control system is to put a speed bump in the way. To make you slow down a bit and think through the implications of what you are doing, to have someone else double check your work. We are not infallible, we make mistakes and that is ok. The goal here is to minimize the number of mistakes that make it into production and to minimize the times that they get there at all.

It’s just bureaucracy, busy work, an annoyance

I’ve met a good number of people that have this attitude. They want to be able to just log in as the administrative user and change whatever they want, whenever they want. Personally, I have given up on trying to convince these people that change control is a good thing.

Design a process that is a speed bump

I spent a good deal of time trying to come up with a better change control system. The system should be easy to use, low impact, and accessible to anyone on the team. One of my over-arching design goals is to simply create a speed bump not a road block.

I spent a good deal of time thinking about what tools we use day to day. Looking at the ones that people complained about, the ones people liked, and the ones nobody said anything about. After thinking for a while I realized that the one tool that everyone used, and there were few complaints about was our DVCS (recently moved to git). It’s just about a perfect fit for a light weight change control system.

The workflow

You don’t need anything special to get up and running with version control as the back to your change control system. The glue that brings everything together is a simple python script that does most of the heavy lifting for you. The script – called – takes the name of the system, and the risk of change and from that creates a new merge request from a template. The merge request is pretty light weight. We want all the detail to be in the actual commits and commit messages.

The basic workflow we use is to create a branch, then create a merge request based on that branch, have the change reviewed and then finally have the review merge the branch into master once the change is complete.

That’s it. One long sentence to define out entire change control workflow.

This is a very simple workflow that accomplishes all of the goals I have for a change control system. To make life even easier for people I wrote a small python script that creates the branch, copies some templates into place for you and then creates the merge request. All you have to do at this point is fill out the details and get someone to review it.

The future

This is just v1 of the our change control system. In the future we will be adding web hooks to automatically send out notification, add calendar entries. Basically the goal is to automate all of the boring manual stuff as much as we can.

You can get the code we use on GitHub

Written by George Beech

February 10th, 2016 at 5:50 pm

Updating VMWare Clusters without DRS

without comments

One of the pieces of the vSphere Enterprise license is DRS. Especially the ability to use DRS to one-click upgrade/update a cluster. If you don’t know what DRS is, the short version is that it is a product you get with the Enterprise license which allows you to have vSphere move VMs around intelligently. One of the added bonuses you get is the ability to evacuate a VM. When you combine that with vSphere Update Manager you get a one-click and an hour later you’re done upgrade of your cluster.

Unfortunately, when that is the one feature you would actually use in the Enterprise edition it doesn’t make financial sense to pay that premium. The question now is “What do you do to make your life easier than manually moving thing?”

The answer is you go and grab Power CLI and write a script! I’ve got one started — Put github link here when done — and I’ll go through some of the details of it here.

First, what are the things that it can do?

  • Migrate running VMs to the other hosts in the cluster
  • Enter and Exit Maintenance mode
  • Move the VMs from where they went, back to the host that was drained

Next, what still needs to be added?

  • Intelligent migrations (it just blasts the vms around blindly now)
  • automatically roll through the whole cluster
  • Everything I haven’t thought of …

The most interesting function here is the evac-host function. This is actually the meat and bones of this script.

function evac-host()
        [string] $ClusterName,
        [string] $vHost

    $AliveHosts = Get-VMHost -Location $ClusterName

    $toHosts = @()

    foreach ($h in $AliveHosts)
        if($h.Name -ne $vHost)

            $toHosts += $h.Name

    $svr = 0
    $vms = get-vm -Location $vHost | where {$_.PowerState -ne "PoweredOff"}
    $m_loc = @{}

    if ($vms.Count -gt 0)

        foreach ($v in $vms)
            Move-VM -vm $v.Name -Destination $toHosts[$svr]
            $m_loc.Add($v.Name, $toHosts[$svr])
            if($svr -ne $toHosts.Length -1 )
                $svr = 0


        $m_loc.GetEnumerator() | Sort-Object Name | export-csv C:\Users\gbeech.STACKEXCHANGE\locations.csv

        write-host "No Powered on VMs on the Host"

The 10,000 foot view is that this function takes a Cluster Name (wild cards are acceptable), and a ESX host name as arguments. Then it goes through and moves every vm off the host. As it does this, it writes the names and locations they got sent to to a Hashtable, then writes those results out to a csv just in case. It also returns the hashtable so we can work with it later, avoiding having to read the csv back in.

Written by George Beech

June 16th, 2014 at 7:41 pm

LOPSA Board – My Candidate Statement

without comments

A little about me

Hi, I’m George and I’m running for one of the open seats on the LOPSA Board.

I’ve been involved in LOPSA for about four years now. I’ve been involved with the NYC local since it started, and have been part of the LOPSA-EAST (Formally PICC) committee for the past three years.

I’ve been working as a System Administrator for the last 14 years. Currently I’m an SRE at Stack Exchange.

I believe that this organization has a lot of promise to be a leader in the field of system administration and would like to be part of those that have been entrusted by the membership to guide the organization to that position.

Ideas for Change

I think you would agree with me that LOPSA as an organization needs to evolve, and change so that it is around and more importantly relevant in the coming years.

The best ways to make sure an organization is relevant is to grow, and have a louder voice. “How do we grow?” “How do we have a louder voice?” The former comes up almost every year at election time, the latter I’ve heard come up more and more in the past year.

Below are my ideas on how to make LOPSA better.

How do we grow

1. Keep working on locals

I’m not going to spend much time on this, but obviously we need to continue the work on growing, and starting new local groups. Everyone says this every year, and I think we do a pretty good job of starting new locals and growing the ones we already have.

2. Online membership

Personally I think by not having some sort of “Online Local” we are leaving a lot on the table by not having some sort of viable online membership. As of 2012 the US Bureau of Labor Statistics says that there are 366,400 people what are classified as Network and Computer Systems Administrators. They are not all in the 15 localities that we have locals.

We need to setup a good online community that all members can access, but it should be geared towards trying to replicate some of the experience of a physical local.

  • Simulate the talks
    • Monthly Articles
    • Monthly Video Presentation
    • Dedicated talk area – forum, possibly comments section – for the above items
  • A way to emulate the before and after hang out sessions

How do we become more vocal

As an organization we need to become more vocal. We need to be putting out statements and positions on the important issues of they day. SOPA, Snowden, and Net Neutrality are all perfect examples of topics that LOPSA should have a vocal and strong opinion on that represents the majority of our membership.

One thing we need to make sure to do is not get stuck trying to get everyone to agree on. Any group will have differing opinions on every topic. One very important part of being vocal and putting our ideals out there is that you can’t put out a statement three weeks after the news has broken. What we need to do is have a group of people that are trusted by the membership to convey the majority opinion.

Finally, Focus

We need to look at everything that we are trying to do as an organization with a critical eye. There needs to be an examination of everything we are doing and see if it fits one of three categories 1) better the sysadmin community, 2) Being a leader in the community, 3) Growing the organization. If anything we are working on does not fit into one of those three categories we need to take a hard look at it and decide if we should continue with that venture.

It doesn’t need to be a hard stop, we will never pick this back up later, but we should stop spending limited resources on things that don’t move us closer to our goals. As we grow we will have more resources in time and money to be able to start re-introducing some of these initiatives that had to be stopped.

Written by George Beech

May 2nd, 2014 at 7:38 pm

Posted in Uncategorized

How Stack Exchange gets the most out of HAProxy

with 26 comments

At Stack Exchange we like to two two, well no three things. One, we love living on the bleeding edge and making use of awesome new features in software. Two, we love configuring the hell out of everything we run which leads to three – getting the absolute most performance out of the software that we run.

HAProxy is no exception to this. We have been running HAProxy since just about day one. We have been on the 1.5dev branch of code since almost the day it came out.

Of course most people would ask why you would do that? You open yourself up to a whole lot of issues with dev code. The answer of course is there are features that we want in this dev code. The less selfish answer is that we want to make the internet a better place for everyone. What better way to do that then running bleeding edge code and finding the issues for you?

I’m going to go through our HAProxy setup and how we are using the features. I would highly recommended reading through the HAProxy documentation for the version you are running. There is a ton of good information in there.


HAProxy Flow - ERD

This is a high level overview of what our network looks like from the cloud to the web front ends. Yes, there is a lot more to us serving you a request, but this is enough for this post.

The basics are that a request comes into our network from the internet. Once it passes our edge routers it goes on to our load balencers. These are CentOS 6 linux boxes running HAProxy 1.5dev. The request comes into our HAProxy load balencers and then depending on what tier that they come into are processed and sent to a backend. After the packet makes it’s way through HAProxy it gets routed to one of the web servers in our IIS farm.

One of the reasons that HAProxy is so damn good at what it does is that is it single minded, as well as (mostly) single threaded. This has lead it to scale very very well for us. One of the nice things about the software being single threaded is that we can buy a decent sized multi-core server and as things need more resources we just split them out to their own tier which is another HAProxy instance, using a different core.

Things get a bit more interesting with SSL as there is a multi-threaded bit to that to be able to handle the transaction overhead there. Going deeper into the how of the threading of HAProxy is out of the scope of this post though, so I’ll just leave it at that.

Phew, we’ve got the introductory stuff out of the way now. Let’s dive into what our HAProxy config actually looks like!

The first bit is our global defaults, and some setup – users, a bit of tuning, and some other odds and ends. All of these options are very well documented in the HAProxy docs, so I won’t bore you by explaining what each one of them do.

For this post all but one example (our websocket config) comes out of what we call “Tier 1” this is our main tier, it’s where we server the Q&A sites and other critical services out of.

userlist stats-auth
    group admin users <redacted>
    user supa_dupa_admin insecure-password <redacted>
    group readonly users <redacted>
    user cant_touch_this insecure-password <redacted>

    stats socket /var/run/haproxy-t1.stat level admin
    stats bind-process 1
    maxconn 100000
    pidfile /var/run/
    log local0
    log local0
    tune.bufsize 16384
    tune.maxrewrite 1024
    spread-checks 4
    nbproc 4

    errorfile 503 /etc/haproxy-shared/errors/503.http
    errorfile 502 /etc/haproxy-shared/errors/502.http
    mode http
    timeout connect 15s
    timeout client 60s
    timeout server 150s
    timeout queue 60s
    timeout http-request 15s
    timeout http-keep-alive 15s
    option redispatch
    option dontlognull
    balance source

Nothing all that crazy here, some tweaks for scale, setting up some users, timeouts, logging options and default balance mode. Generally you want to tune the


and your timeout values size to your environment and your application. Other than that the defaults should work for 98% of the people out there.

Now that we have our defaults setup, lets look a little deeper into the really interesting parts of our configuration. I will point out things that we use that are only available in 1.5dev as I go.

First, our SSL termination. We used to use Nginx for our SSL termination but as we grew our deployment of SSL. We knew that SSL support was coming to HAProxy, so we waited for it to come out then went in whole hog.

listen ssl-proxy-1
    bind-process 2 3 4
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/misc.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/wc-san.pem
    bind ssl crt /etc/haproxy-shared/ssl/misc.pem
    mode tcp
    server http send-proxy
    server http send-proxy
    server http send-proxy
    server http send-proxy
    server http send-proxy

    maxconn 100000

This is a 1.5dev feature.

HAProxy - Core Detail - New Page

Once again, this is a pretty simple setup the gist of what is going on here, is that we setup a listener on port 443. It binds to the specified IP addresses as an SSL port using the specified certificate file in PEM format specifically the full chain including the private key. This is actually a very clean way to setup SSL since you just have one file to manage, and one config line to write when setting up an SSL endpoint.

The next thing it does is set the target server to itself (,2,3 etc) using the send-proxy directive which tell the proccess to use the proxy protocol so that we don’t lose some of that tasty information when the packet gets shipped to the plain http front end.

Now hold on a second! Why are you using multiple localhost proxy connections?! Ahh, good catch. Most people probably won’t run into this, but it’s because we are running out of source ports when we only use one proxy connection. We ran into something called source port exhaustion. The quick story is that you can only have ~65k ip:port to ip:port connections. This wasn’t an issue before we started using SSL since it we never got close to that limit.

What happened when we started using SSL? Well we started proxying a large amount of traffic via I mean we do have a feeewww more than 65k connections.

Total: 581558 (kernel 581926)
TCP:   677359 (estab 573996, closed 95478, orphaned 1237, synrecv 0, timewait 95475/0), ports 35043

So the solution here is to simply load balance between a bunch of ip’s in the space. Giving us ~65k more source ports per entry.

The final thing I want to point out about the SSL front end is that we use the bind-process directive to limit the cores that that particular front end is allowed to use. This allows us to have multiple HAProxy instances running and not have them stomp all over eachother in a multi-core machine.

Our HTTP Fronend

The real meat of our setup is our http frontend. I will go through this piece by piece and at the end of this section you can see the whole thing if you would like.

frontend http-in
    bind name stackexchange
    bind name careers
    bind name
    bind name openid
    bind name misc
    bind name stackexchange
    bind name careers
    bind name
    bind name openid
    bind name misc
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind accept-proxy name http-in
    bind-process 1

Once again, this is just setting up our listeners, nothing all that special or interesting here. Here is where you will find the binding that our SSL front end sends to with the accept-proxy directive. Additionally, we give them a name so that they are easier to find in our monitoring solution.

stick-table type ip size 1000k expire $expire_time store gpc0,conn_rate($some_connection_rate)

## Example from HAProxy Documentation (not in our actual config)##
# Keep track of counters of up to 1 million IP addresses over 5 minutes
# and store a general purpose counter and the average connection rate
# computed over a sliding window of 30 seconds.
stick-table type ip size 1m expire 5m store gpc0,conn_rate(30s)

The first interesting piece is the stick-table line. What is going on here is we are capturing connection rate for the incoming IPs to this frontend and storing them into gpc0 (General Purpose Counter 0). The example from the HAProxy docs on stick-tables explains this pretty well.

    log global

    capture request header Referer               len 64
    capture request header User-Agent            len 128
    capture request header Host                  len 64
    capture request header X-Forwarded-For       len 64
    capture request header Accept-Encoding       len 64
    capture response header Content-Encoding     len 64
    capture response header X-Page-View          len 1
    capture response header X-Route-Name         len 64
    capture response header X-Account-Id         len 7
    capture response header X-Sql-Count          len 4
    capture response header X-Sql-Duration-Ms    len 7
    capture response header X-AspNet-Duration-Ms len 7
    capture response header X-Application-Id     len 5
    capture response header X-Request-Guid       len 36
    capture response header X-Redis-Count        len 4
    capture response header X-Redis-Duration-Ms  len 7
    capture response header X-Http-Count         len 4
    capture response header X-Http-Duration-Ms   len 7
    capture response header X-TE-Count           len 4
    capture response header X-TE-Duration-Ms     len 7

rspidel ^(X-Page-View|Server|X-Route-Name|X-Account-Id|X-Sql-Count|X-Sql-Duration-Ms|X-AspNet-Duration-Ms|X-Application-Id|X-Request-Guid|X-Redis-Count|X-Redis-Duration-Ms|X-Http-Count|X-Http-Duration-Ms|X-TE-Count|X-TE-Duration-Ms):

We are mostly doing some setup for logging here. What is happening, is that as a request comes in or leaves we capture some specific headers using capture response or capture request depending on where the request is coming from. HAProxy then takes those headers and inserts them into the syslog message that is sent to our logging solution. Once we have captured the headers that we want on the response we remove them using rspidel to strip them from the response sent to the client. rspidel uses a simple regex to find and remove the headers.

The next thing that we do is to setup some ACLs. I’ll just show a few examples here since we have quite a few.

acl source_is_serious_abuse src_conn_rate(http-in) gt $some_number
acl api_only_ips src -f /etc/haproxy-shared/api-only-ips
acl is_internal_api path_beg /api/
acl is_area51 hdr(host) -i
acl is_kindle hdr_sub(user-agent) Silk-Accelerated

I would say that the first ACL here is one of the more important ones we have. Remember that stick-table we setup earlier? Well this is where we use that. It adds your IP to the ACL source_is_serious_abuse if your IP’s connection rate in gpc0 is greater than $some_number. I will show you what we do with this shortly when I get to the routing in the config file.

The next few acl’s are just examples of different ways that you can setup acl’s in HAProxy. For example, we check to see if your user agent has ‘Silk-Accelerated’ in the UA. If it does we put you in the is_kindle acl.

Now that we have those acl’s setup, what exactaly do we use them for?

    tcp-request connection reject if source_is_serious_abuse !source_is_google !rate_limit_whitelist
    use_backend be_go-away if source_is_abuser !source_is_google !rate_limit_whitelist

The first thing we do is deal with those connections that make it onto our abuse ACLs. The first one just deny’s the connection if you are bad enough to hit our serious abuse ACL – unless you have been whitelisted or are google. The second one is a soft error that throws up a 503 error if you are just a normal abuser – once again unless you are google or whitelisted.

The next thing we do is some request routing. We send different requests to different server backends.

    use_backend be_so_crawler if is_so is_crawler
    use_backend be_so_crawler if is_so is_crawler_ua
    use_backend be_so if is_so
    use_backend be_stackauth if is_stackauth
    use_backend be_openid if is_openid

    default_backend be_others

What this is doing is matching against ACLs that where setup above, and sending you to the correct backend. If you don’t match any of the ACLs you get sent to our default backend.

An Example Backend

Phew! That’s a lot of information so far. We really do have a lot configured in our HAProxy instances. Now that we have our defaults, general options, and front ends configured what does one of our backends look like?

Well they are pretty simple beasts. Most of the work is done on the front end.

backend be_others
    mode http
    bind-process 1
    stick-table type ip size 1000k expire 2m store conn_rate($some_time_value)
    acl rate_limit_whitelist src -f /etc/haproxy-shared/whitelist-ips
    tcp-request content track-sc2 src
    acl conn_rate_abuse sc2_conn_rate gt $some_value
    acl mark_as_abuser sc1_inc_gpc0 gt $some_value
    tcp-request content reject if conn_rate_abuse !rate_limit_whitelist mark_as_abuser

    stats enable
    acl AUTH http_auth(stats-auth)
    acl AUTH_ADMIN http_auth_group(stats-auth) $some_user
    stats http-request auth unless AUTH
    stats admin if AUTH_ADMIN
    stats uri /my_stats
    stats refresh 30s

    option httpchk HEAD / HTTP/1.1\r\nUser-Agent:HAProxy\r\

    server ny-web01 check
    server ny-web02 check
    server ny-web03 check
    server ny-web04 check
    server ny-web05 check
    server ny-web06 check
    server ny-web07 check
    server ny-web08 check
    server ny-web09 check

There really isn’t too much to our back ends. We setup some administrative auth at the beginning. The next thing we do is, I think the most important part. We specify with the option httpchk where we want to connect when doing a check on the host to see if it’s up.

In this instance we are just checking ‘/’ but a lot of our back ends have a ‘/ping’ route that gives more information about how the app is performing for our out monitoring solutions. To check those routes we simply change ‘HEAD /’ to ‘HEAD /ping’

Final Words

Man, that we sure a lot of information to write, and process. But using this setup has giving us a very stable, scalable and flexible load balancing solution. We are quite happy with the way that this is all setup, and has been running smoothly for us.

Update 9/21/14: For those curious you can look at the full, sanitized tier one config we use.

Written by George Beech

March 25th, 2014 at 4:39 pm

Fun With PowerShell, WS-MAN, and Dell Servers

with 8 comments

Recently I’ve been playing with using the WS-MAN protocol to gather information (and eventually run updates) on our Dell servers. It has actually been a fairly insteresting project after I got through the pretty high learning curve to get started using WS-MAN.

First, what is WS-MAN? It’s a management standard developed by the DTMF. What it really boils down to is giving us the ability to access and manipulate CIM providers via HTTP calls.

One of the interesting things Dell did with their systems in the past two generations (Gen 11 and 12) is to add something they call the Life Cycle controller. They did not really make much information known on what you can do with it, or even how to really use it.

Recently I have been exploring what you can do with the Life Cycle Controller. And, quite honestly, you can do a ton of good stuff with it. Everything from getting system information to setting boot options, all the way up to updating all of the firmware on your box. This is all done through the WS-MAN Protocol.

First I would suggest doing some reading so you can get the basic concepts of WS-MAN.

Phew, got through all that?

Lets start off with a nice code snippet that I have been working on, and then step through what it is doing.

$DELL_IDS = @{
    "20137" = "DRAC";
    "18980" = "LCC";
    "25227" = "DRAC";
    "28897" = "LCC";
    "159" = "BIOS"

$pass = ConvertTo-SecureString "ThisIsMyPassword" -AsPlainText -Force
$creds = new-object System.Management.Automation.PSCredential ("root", $pass)
$wsSession = New-WSManSessionOption -SkipCACheck -SkipCNCheck

$svc_details = @{}

$base_subnet = "192.168.99."
$addrs = @(1..254)
foreach ($ip in $addrs)
    $base_subnet + $ip
    $s = [System.Net.Dns]::GetHostByAddress($base_subnet+$ip).HostName
<code>$fw_info = Get-WSManInstance 'cimv2/root/dcim/DCIM_SoftwareIdentity' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds
$svr_info = Get-WSManInstance '' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds

$svc_details.Add($s, @{})
if($svr_info -eq $null)
    $svc_details[$s].Add(&quot;Generation&quot;, &quot;unknown probably 11G&quot;)
    $svc_details[$s].Add(&quot;Generation&quot;, $svr_info.SystemGeneration.Split(&quot; &quot;)[0])
foreach ($com in $fw_info)

        #need to see if I can update this to account for the different
        #way drac6 and 7's format this string
        $inst_state = $com.InstanceID.Split(&quot;#&quot;)[0].Split(&quot;:&quot;)[1]
        if (($inst_state -ne &quot;PREVIOUS&quot;) -AND ($inst_state -ne &quot;AVAILABLE&quot;))
            $svc_details[$s].Add($DELL_IDS[$com.ComponentID], $com.VersionString)


The first part of this code is simply a hash table of dell component IDs and an associated easy-to-remember name matching them with the component. How did I get those? Well I queried the cimv2/root/dcim/DCIM_SoftwareIdentity namespace and parsed the output by hand to grab those IDs. They match up to BIOS, LCC v1, LCC v2, iDRAC 6 and iDRAC 7.

$pass = ConvertTo-SecureString "ThisIsMyPassword" -AsPlainText -Force
$creds = new-object System.Management.Automation.PSCredential ("root", $pass)
$wsSession = New-WSManSessionOption -SkipCACheck -SkipCNCheck

This next section of code sets up our enviroment for Get-WSManInstance. First we need to convert our plaintext password into a secure string, then create a PSCredential object to use later so we don’t have to enter our username and password over and over. Finally, we setup a new WS-MAN session options object so that it doesn’t error out on the self signed certificates we are using. If you are using fully trusted certificates on your dracs you can skip this step and not specify the -SessionOption $wsSession flag later.

$fw_info = Get-WSManInstance 'cimv2/root/dcim/DCIM_SoftwareIdentity' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds
$svr_info = Get-WSManInstance '' -Enumerate -ConnectionURI https://$s/wsman -SessionOption $wsSession -Authentication basic -Credential $creds

Note You can specify either the DCIM Path or the schema, I’m showing both ways here. For the $svr_info variable 'cimv2/root/dcim/DCIM_SystemView' would also work.

Now, we move on to the meat of what we are doing. These two lines grab the system information that we want to parse. The $fw_info contains an XML object that returns all of the install components as exposed by the DCIM_SoftwareIdentity endpoint, and the $svr_info variable contains an XML object that has some interesting system information – such as Server Generation, Express Service Code, Service Tag, and so on. I use these two pieces of information to parse out the Generation, DRAC, BIOS, and LCC firmware versions.

#need to see if I can update this to account for the different</h1>
#way drac6 and 7's format this string</h1>

$inst_state = $com.InstanceID.Split("#")[0].Split(":")[1]
if (($inst_state -ne "PREVIOUS") -AND ($inst_state -ne "AVAILABLE"))
        svc_details[$s].Add($DELL_IDS[$com.ComponentID], $com.VersionString)

One last tricky bit. When you get back the versions that are installed, you will actually have two different versions. Once that is the active version and one that is the rollback version. Unfortunately you need to parse string to figure that out. And different DRACs use different string formats.

  • Drac6: DCIM:INSTALLED:PCI:14E4:1639:0236:1028:5.0.13
  • Drac7: DCIM:INSTALLED#802__DriverPack.Embedded.1:LC.Embedded.1

Once I have this information in my two-dimensional array I can create reports and manipulate the information to tell me exactly what version each of my servers is at.

Sweet! Step one to automating the update of our firmware complete! Next up figure out how to automate the deployment and installation of new firmware.

Written by George Beech

February 26th, 2014 at 10:17 pm