ktower's kontemplations

Tuesday, August 12, 2014

Wreck Diving Class

It has been entirely too long since I have posted here; let's see about fixing that.

Over the previous weekend, I took the PADI Wreck Diver Specialty class from TL Sea Diving. The vast majority of the class was spent diving a couple of wrecks up in Nanaimo, BC. It was an incredible amount of fun, although admittedly a little nervous for me.

The wrecks we dove were the HMCS Cape Breton and the HMCS Saskatchewan. Both of these wrecks are not only wreck dives, but they are deep dives, with the decks of the ships at somewhere around 100 fsw. I actually haven't done very much deep diving, with my deepest dive to date being 112 ft, so triple-digit depths don't make me entirely comfortable. Combine that with the fact that access to these dives was via a fairly small, motion-sickness-inducing boat, and I'm more used to simple shore dives (or better yet, the diving platform at the Aquarium!), the whole environment made for a bit of discomfort.

Despite all this, I had a wonderful time. The visibility was fantastic on all of these dives (I'd say typically 50 feet or more), and there was a ton of life growing on the boats. Various species of rockfish, lingcod, cabezon were among the fish representing themselves, and just about every surface of the boats were covered with giant plumose anemone, sponges and hydroids.

Of course, we didn't just explore the exterior of these ships, as full of life as they were. The class also teaches skills necessary to safely penetrate a wreck (for the non-divers reading this, any environment that prevents you from having direct access to the surface, otherwise known as an "overhead environment" is generally a Bad Thing unless you have the proper training). The skills we did were:

Swim-through a 3-walled environment. The Cape Breton has partially enclosed hallways/decks that are enclosed on 3 sides (top, bottom, and one side). We swam through this just to get used to the feeling of having something over our heads and practice good finning technique in a safe environment (a poor kick wouldn't silt up the area).
Limited penetration without a line. These ships are great for this because before they were sunk, large access holes were cut out of the hull, meaning that you can swim through many of the hallways and always have an exit easily within view. We swam into the Saskatchewan's bridge and through a few different rooms, including the location where the ashes of Raymond Goode, a former crew member of the HMCS Saskatchewan, are kept.
Penetration on a line. Again on the Saskatchewan, we wandered into the interior, but following a line strung by our instructor.

On the fourth wreck dive, we ended up doing a full-length penetration of the Saskatchewan, starting from the aft munitions bay and swimming through the halls until just before the bow. It was a little spooky, but also very cool.

And finally, while it wasn't a wreck dive, we were also given a chance to dive a nearby site named Snake Island Wall. This was another spectacular dive, with tons of life to see (probably the largest cabezon I've ever run across), and amazing visibility.

Unfortunately, I didn't get any pictures or video of any of these dives. I decided that I didn't need the additional task loading of taking pictures and opted instead to focus on the skills. I wish I had some pictures, though, because the views were really spectacular. I guess that just means I'll have to go back. In the mean time, though, here are a couple of videos that someone else has taken of their dives on the sites.

(Courtesy of Youtube user Michael Meagher)

(Courtesy of Youtube user
Jack Beslanwitch)

Finally, if you care to look, as is my practice, the dive logs for these dives are available online as well.

All in all, this was a fantastic trip. I will definitely need to revisit at some point so I can take a closer look at what is there.

Monday, July 8, 2013

ROV Competition

Recently, I had the pleasure of being able to assist at the International MATE ROV competition, held at the King County Aquatic Center in Federal Way, WA this year. My function was a diver -- I helped set up the underwater "courses" for the student teams to run their robots through, and then spent one day at the competitions assisting.

Redhat Enterprise Linux, IPv6, and SLAAC

If you are just starting to get into IPv6, you might notice in your researching that there are a very large number of web sites out there that "solve" IPv6 issues by describing how to outright disable the IPv6 protocol on a host entirely. With the pool of IPv4 addresses having been exhausted for quite some time now, I feel this is a very short-sighted approach. In the end, it is better to understand the problem and solve it rather than simply sweeping it under the carpet. What follows is a description of a problem I spent the better part of a day investigating, namely trying to get a RedHat Enterprise Linux (RHEL) server to properly stop auto-assigning its own IPv6 address.

Disclaimer

I should mention that I am by no means an "expert" on the topic. Most of what I have learned has been by piecing together bits of information from various sources that were either vague, incomplete, or outdated (as is much of the IPv6 information out there, sadly), and by brute-force experimentation on my own RHEL systems. What I write here is completely of a "it worked for me, but YMMV" nature.

IPv6 Addressing Primer

IPv6 is still "new" enough to many people that I feel a quick primer is warranted.

If you have a RedHat Enterprise Linux host on a network and haven't taken any of these steps to disable IPv6, you will probably observe that out of the box, it will have a "link-local" address assigned to every network interface. These addresses start with "fe80", are automatically generated based on the interface's MAC address, and provide connectivity only on the local network segment.

As soon as global IPv6 connectivity is added to a connected network, you will likely see another address pop up on the interface associated with it. This will be a globally routeable address that is automatically determined via "Stateless Autoconfiguration" or "SLAAC". Like the link-local address, it is "self-assigned" by the client system, and is typically based on the interface's MAC address (unless privacy extensions are enabled). However, the client needs to know information, such as the IPv6 subnet and router information. These are provided via periodic "router advertisements" or RAs that are ICMPv6 broadcast packets containing these data. These RAs are periodically broadcast by a router (typically about every 10 minutes), but can also be explicitly requested by the client via a "Router Solicitation" ICMPv6 request.

There are other ways of accomplishing this, such as via DHCPv6, but for the purposes of this article, SLAAC is what we care about. Plus, my impression is that DHCPv6 is still rather new and isn't used as much.

The Problem

This stateless auto-configuration works great for client systems -- systems where a user is typically sitting at a keyboard and interacting directly with the computer. Much in the same way a client system frequently receives an IPv4 address automatically via a DHCP server, SLAAC provides a way to automatically assign IPv6 addresses: turn on IPv6 on the network and things "just work."

The same can't necessarily be said for IPv6-connected servers however. Just as it is fairly uncommon to let your servers auto-assign an IPv4 address via DHCP (yes, I know it can be done in a perfect world. Perhaps I'm just old fashioned), you probably don't want an IPv6 address to auto-assign either. You may not be ready to offer IPv6 services on the system, or you aren't prepared to support IPv6 because you don't have a firewall configuration ready to support it. No matter what your reason, you probably don't want a server to obtain a globally-routable IPv6 address until you are ready to support it, so you don't want SLAAC.

One way to accomplish this is to simply turn off router advertisements for the entire network. However, this leads to its own problems -- what if you have some systems that need IPv6 and others that don't? Or, in some cases this may lead to reduced functionality of the network as a whole (for example, I'm told by the network engineers at work that disabling RAs on a subnet will break VRRP for the subnet, effectively eliminating router redundancy).

The better solution would be to simply configure the server to ignore router advertisements, effectively disabling SLAAC. Then you can add an IPv6 configuration when you are ready to support it.

Googling will tell you that on a Linux system the way to do this is to set some kernel tunables in /etc/sysctl.conf and restart your networking stack:

net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0

However, on a Redhat Enterprise Linux (or any of the various clones), you might be in for a bit of a surprise. Despite these settings being in sysctl.conf, upon a reboot, you may find a nasty surprise:

# sysctl -a | grep "accept_ra ="

net.ipv6.conf.default.accept_ra = 1

net.ipv6.conf.all.accept_ra = 1

net.ipv6.conf.eth0.accept_ra = 1

net.ipv6.conf.lo.accept_ra = 1

The active configuration still has the settings enabled! And if you look at your network interfaces, you'll note that they do in fact still have globally routeable IPv6 addresses. What goes on?

The Solution

The key lies in the /etc/sysconfig/network file. The following lines need to be present:

NETWORKING_IPV6=yes

IPV6_AUTOCONF=no

After the lines have been added, you can make the configuration active via a standard network stack restart via "/sbin/service network restart". If all goes well, you should see the SLAAC-assigned v6 address disappear, and the kernel tunables are changed to more desirable values.

Without these lines, RHEL seems to actively revert to the default case of enabling SLAAC, as seen above. Also, both lines are required. Unless the first is present and enabled, any other IPv6 directives appear to be ignored.

Once you are ready to actually support IPv6 on your server, you can then proceed to manually assign an IP address in the appropriate /etc/sysconfig/network-scripts/ifcfg-* file (using the IPV6ADDR and IPV6_DEFAULTGW variables). Or, you could turn on SLAAC again by setting IPV6_AUTOCONF=yes I suppose, but I wouldn't recommend it.

Wednesday, November 28, 2012

XenServer Management and Jumbo Frames

In a word, don't do it.

Perhaps some additional background would help. :)

We maintain many XenServer pools, most of which consist of four "compute" servers attached to a shared storage array. Each server has two ethernets acting as a management network bond, as well as two ethernets acting as a bond for VM traffic. The VM traffic is VLAN-tagged, the management traffic is not.

We had recently upgraded all of our pools to XenServer 6.1, a little faster than we typically would have so that we could gain access to some of the cool, new features (e.g., inter-pool VM migration). Life is good, everything works fine. Until it came time to apply a couple of patches. After applying a patch I would reboot the server, at which point it would momentarily re-contact the pool and then disappear. The Xapi services on the host would not respond, and the pool master would not acknowledge the node's presence. SSH connectivity to the node worked, however.

This issue proved to be pre-existing, as in the patches were not what caused the problem. I tried rebooting a node that had vanilla XS 6.1 and it exhibited similar symptoms. It was just coincidental that the servers had not been rebooted until it came time to apply patches.

After some experimentation and trial and error, I was able to [reliably] get the node back online by performing an "emergency network reset" and rebooting. However, the node would rejoin successfully only until the next reboot, whereupon it became a case of rinse, lather, repeat.

Further trial-and-error showed that if I removed the management bond entirely and ran all management traffic through a single interface, reboots worked properly and as expected (i.e., the system would seamlessly rejoin the pool). Recreate the bond and the problem re-manifested.

Hrm.

After a period of tearing out my hair over this, I noticed the MTU setting. We typically configure our VM traffic bonds with an MTU of 9000 so that customers can use so called "jumbo frames" within their VMs. Without putting too much thought into it, we had also been configuring our management bonds with MTU=9000 as well. On a hunch, I re-created the management bond, but this time with a default MTU of 1500. Rebooted the node and....SUCCESS! It correctly re-joined the pool after a reboot.

So, the moral of the story seems to be that if you have XenServer 6.1 installed on a system with a bonded management interface, ensure that bond has the default MTU of 1500. Jumbo frames seem to make it unhappy for reasons unknown to me. We've had these bonds enabled for quite some time -- this behavior seems to be new with version 6.1. I haven't yet contacted Citrix to see if they are aware of the issue or not, but I thought I would at least document the issue here, in case someone else out there runs into similar problems. I know that my many, many google searches on the matter ended up being fruitless.

The silver lining in this particular cloud is that throughout all this mess, all of our virtual machines stayed online and had no issues whatsoever, so our customers were never even aware there was a problem! That has to count for something...

Saturday, June 30, 2012

Linux Kernel and Leap Seconds

We had several systems at $WORK tonight become somewhat unresponsive. CPU usage was pegged, and interactive response times were typically abysmal, sometimes waiting several seconds and/or minutes for keystrokes to be acknowledged.

In addition, some systems were responding well enough, but had very high context switch rates. The lowest rate I saw was about 500,000 context switches a second, but the highest I saw was over 2.8 million switches a second!

Almost all of the poorly-performing systems were virtual servers, and the performing-okay-but-high-context-switch-rate systems were all physical servers. I hypothesized that the problem was the same one, but the physicals had more CPU power available to them -- most of our VMs don't have more than 2-3 vCPUs, whereas the physicals have upwards of 16 in some cases, if you count hyperthreads.

As it turns out, it was being caused by some weird (sorry, I'll try to keep from adding any more of this technical jargon...) kernel interaction when it processed the leap second that occurred today. For more details, I turn you now to the blog entry that helped me narrow the problem down and provided me with a simple fix:

http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

For the record, should this link ever stop working, he said:

The fix is quite simple – simply set the date. Alternatively, you can restart the machine, which also works. Restarting MySQL (or Java, or whatever) does NOT fix the problem. We put the following into puppet to run on all our machines:

$ cat files/bin/leap-second.sh
#!/bin/bash
# this is a quick-fix to the 6/30/12 leap second bug

if [ ! -f /tmp/leapsecond_2012_06_30 ]
then
/etc/init.d/ntpd stop; date -s "`date`" && /bin/touch /tmp/leapsecond_2012_06_30
fi

His solution was a lot more elegant than mine, which was to simply reboot the system. :) It was also a lot easier to apply prophylactically to our entire fleet.

Wednesday, March 14, 2012

Piecaken for Pi Day

Awhile ago I ran across a flickr picture of a "piecaken." For the uninitiated, a piecaken is a cake with one or more pies baked into it. The version in the link is a two-layer variety, but I realized that a single layer wouldn't actually be too difficult. I also made the mistake of mentioning it at the office shortly after the announcement of the annual pi day celebration where we bring in pies of all shape to share with each other in celebration of this well known mathematical constant. Unfortunately, I mentioned it within earshot of my director, who also happened to be the organizer of this party, and she virtually insisted that I had to bring one. Despite my trepidations, not only in my rather lacking skill-set in this area (I'm not exactly the world's greatest baker), but also whether or not it would taste any good, I set forth to create one.

Goodbye Cable TV...Almost

I've been working for the past few weeks to prepare for the severing of relationships with our current cable TV provider -- signing up for Hulu, building a HTPC (Home Theater Personal Computer, for the uninitiated), etc. Last night, I completed the last few steps -- powered down the TiVo and removed the CableCARDs in it. Then, this morning, I called to cancel my account.

Except I didn't do it.

As I had expected, I was passed over to the "retentions" department and I was given a few options to entice me to stay. What I wasn't expecting, however, were the plans that didn't have any promotional pricing, and pretty reasonable prices at that. I ended up scaling us back to the "bare bones minimum" plan that only gives us the network channels plus a smattering of basic cable channels (Discovery being the only one he listed that we watch with any frequency). This plan is about $21/mo (probably $25 after taxes), which compared to the $98 we had been paying before (plus the $13/month going to TiVo, which is also going to get cancelled in favor of the HTPC), is still a pretty reasonable savings -- about $86 per month, or just over $1000 a year!

I'm not viewing this as "caving in" to their attempts to retain me as a paying customer, and part of me was expecting this to be the end outcome anyway, for the following reasons:

While I have managed to locate online alternatives to almost all of our shows, the ones we like from Discovery do not really seem to be online anywhere, other than paying per episode from Amazon. Plus, Discovery can be one of those great put-it-on-and-forget-about-it channels, for example when I'm sick on the couch.
Part of my plan had included getting the network stations "OTA" (over the air) with an antenna. Unfortunately, I don't yet have an antenna in place, nor do I have the structured cabling installed yet to get the signal from it to the TV. Since this new cable plan has no commitment period, this can act, if nothing else, as a stop-gap until we can get the antenna set up.

So, we shall see how this works....