ktower's kontemplations

Monday, July 8, 2013

ROV Competition

Recently, I had the pleasure of being able to assist at the International MATE ROV competition, held at the King County Aquatic Center in Federal Way, WA this year. My function was a diver -- I helped set up the underwater "courses" for the student teams to run their robots through, and then spent one day at the competitions assisting.

Redhat Enterprise Linux, IPv6, and SLAAC

If you are just starting to get into IPv6, you might notice in your researching that there are a very large number of web sites out there that "solve" IPv6 issues by describing how to outright disable the IPv6 protocol on a host entirely. With the pool of IPv4 addresses having been exhausted for quite some time now, I feel this is a very short-sighted approach. In the end, it is better to understand the problem and solve it rather than simply sweeping it under the carpet. What follows is a description of a problem I spent the better part of a day investigating, namely trying to get a RedHat Enterprise Linux (RHEL) server to properly stop auto-assigning its own IPv6 address.

Disclaimer

I should mention that I am by no means an "expert" on the topic. Most of what I have learned has been by piecing together bits of information from various sources that were either vague, incomplete, or outdated (as is much of the IPv6 information out there, sadly), and by brute-force experimentation on my own RHEL systems. What I write here is completely of a "it worked for me, but YMMV" nature.

IPv6 Addressing Primer

IPv6 is still "new" enough to many people that I feel a quick primer is warranted.

If you have a RedHat Enterprise Linux host on a network and haven't taken any of these steps to disable IPv6, you will probably observe that out of the box, it will have a "link-local" address assigned to every network interface. These addresses start with "fe80", are automatically generated based on the interface's MAC address, and provide connectivity only on the local network segment.

As soon as global IPv6 connectivity is added to a connected network, you will likely see another address pop up on the interface associated with it. This will be a globally routeable address that is automatically determined via "Stateless Autoconfiguration" or "SLAAC". Like the link-local address, it is "self-assigned" by the client system, and is typically based on the interface's MAC address (unless privacy extensions are enabled). However, the client needs to know information, such as the IPv6 subnet and router information. These are provided via periodic "router advertisements" or RAs that are ICMPv6 broadcast packets containing these data. These RAs are periodically broadcast by a router (typically about every 10 minutes), but can also be explicitly requested by the client via a "Router Solicitation" ICMPv6 request.

There are other ways of accomplishing this, such as via DHCPv6, but for the purposes of this article, SLAAC is what we care about. Plus, my impression is that DHCPv6 is still rather new and isn't used as much.

The Problem

This stateless auto-configuration works great for client systems -- systems where a user is typically sitting at a keyboard and interacting directly with the computer. Much in the same way a client system frequently receives an IPv4 address automatically via a DHCP server, SLAAC provides a way to automatically assign IPv6 addresses: turn on IPv6 on the network and things "just work."

The same can't necessarily be said for IPv6-connected servers however. Just as it is fairly uncommon to let your servers auto-assign an IPv4 address via DHCP (yes, I know it can be done in a perfect world. Perhaps I'm just old fashioned), you probably don't want an IPv6 address to auto-assign either. You may not be ready to offer IPv6 services on the system, or you aren't prepared to support IPv6 because you don't have a firewall configuration ready to support it. No matter what your reason, you probably don't want a server to obtain a globally-routable IPv6 address until you are ready to support it, so you don't want SLAAC.

One way to accomplish this is to simply turn off router advertisements for the entire network. However, this leads to its own problems -- what if you have some systems that need IPv6 and others that don't? Or, in some cases this may lead to reduced functionality of the network as a whole (for example, I'm told by the network engineers at work that disabling RAs on a subnet will break VRRP for the subnet, effectively eliminating router redundancy).

The better solution would be to simply configure the server to ignore router advertisements, effectively disabling SLAAC. Then you can add an IPv6 configuration when you are ready to support it.

Googling will tell you that on a Linux system the way to do this is to set some kernel tunables in /etc/sysctl.conf and restart your networking stack:

net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0

However, on a Redhat Enterprise Linux (or any of the various clones), you might be in for a bit of a surprise. Despite these settings being in sysctl.conf, upon a reboot, you may find a nasty surprise:

# sysctl -a | grep "accept_ra ="

net.ipv6.conf.default.accept_ra = 1

net.ipv6.conf.all.accept_ra = 1

net.ipv6.conf.eth0.accept_ra = 1

net.ipv6.conf.lo.accept_ra = 1

The active configuration still has the settings enabled! And if you look at your network interfaces, you'll note that they do in fact still have globally routeable IPv6 addresses. What goes on?

The Solution

The key lies in the /etc/sysconfig/network file. The following lines need to be present:

NETWORKING_IPV6=yes

IPV6_AUTOCONF=no

After the lines have been added, you can make the configuration active via a standard network stack restart via "/sbin/service network restart". If all goes well, you should see the SLAAC-assigned v6 address disappear, and the kernel tunables are changed to more desirable values.

Without these lines, RHEL seems to actively revert to the default case of enabling SLAAC, as seen above. Also, both lines are required. Unless the first is present and enabled, any other IPv6 directives appear to be ignored.

Once you are ready to actually support IPv6 on your server, you can then proceed to manually assign an IP address in the appropriate /etc/sysconfig/network-scripts/ifcfg-* file (using the IPV6ADDR and IPV6_DEFAULTGW variables). Or, you could turn on SLAAC again by setting IPV6_AUTOCONF=yes I suppose, but I wouldn't recommend it.

Wednesday, November 28, 2012

XenServer Management and Jumbo Frames

In a word, don't do it.

Perhaps some additional background would help. :)

We maintain many XenServer pools, most of which consist of four "compute" servers attached to a shared storage array. Each server has two ethernets acting as a management network bond, as well as two ethernets acting as a bond for VM traffic. The VM traffic is VLAN-tagged, the management traffic is not.

We had recently upgraded all of our pools to XenServer 6.1, a little faster than we typically would have so that we could gain access to some of the cool, new features (e.g., inter-pool VM migration). Life is good, everything works fine. Until it came time to apply a couple of patches. After applying a patch I would reboot the server, at which point it would momentarily re-contact the pool and then disappear. The Xapi services on the host would not respond, and the pool master would not acknowledge the node's presence. SSH connectivity to the node worked, however.

This issue proved to be pre-existing, as in the patches were not what caused the problem. I tried rebooting a node that had vanilla XS 6.1 and it exhibited similar symptoms. It was just coincidental that the servers had not been rebooted until it came time to apply patches.

After some experimentation and trial and error, I was able to [reliably] get the node back online by performing an "emergency network reset" and rebooting. However, the node would rejoin successfully only until the next reboot, whereupon it became a case of rinse, lather, repeat.

Further trial-and-error showed that if I removed the management bond entirely and ran all management traffic through a single interface, reboots worked properly and as expected (i.e., the system would seamlessly rejoin the pool). Recreate the bond and the problem re-manifested.

Hrm.

After a period of tearing out my hair over this, I noticed the MTU setting. We typically configure our VM traffic bonds with an MTU of 9000 so that customers can use so called "jumbo frames" within their VMs. Without putting too much thought into it, we had also been configuring our management bonds with MTU=9000 as well. On a hunch, I re-created the management bond, but this time with a default MTU of 1500. Rebooted the node and....SUCCESS! It correctly re-joined the pool after a reboot.

So, the moral of the story seems to be that if you have XenServer 6.1 installed on a system with a bonded management interface, ensure that bond has the default MTU of 1500. Jumbo frames seem to make it unhappy for reasons unknown to me. We've had these bonds enabled for quite some time -- this behavior seems to be new with version 6.1. I haven't yet contacted Citrix to see if they are aware of the issue or not, but I thought I would at least document the issue here, in case someone else out there runs into similar problems. I know that my many, many google searches on the matter ended up being fruitless.

The silver lining in this particular cloud is that throughout all this mess, all of our virtual machines stayed online and had no issues whatsoever, so our customers were never even aware there was a problem! That has to count for something...

Saturday, June 30, 2012

Linux Kernel and Leap Seconds

We had several systems at $WORK tonight become somewhat unresponsive. CPU usage was pegged, and interactive response times were typically abysmal, sometimes waiting several seconds and/or minutes for keystrokes to be acknowledged.

In addition, some systems were responding well enough, but had very high context switch rates. The lowest rate I saw was about 500,000 context switches a second, but the highest I saw was over 2.8 million switches a second!

Almost all of the poorly-performing systems were virtual servers, and the performing-okay-but-high-context-switch-rate systems were all physical servers. I hypothesized that the problem was the same one, but the physicals had more CPU power available to them -- most of our VMs don't have more than 2-3 vCPUs, whereas the physicals have upwards of 16 in some cases, if you count hyperthreads.

As it turns out, it was being caused by some weird (sorry, I'll try to keep from adding any more of this technical jargon...) kernel interaction when it processed the leap second that occurred today. For more details, I turn you now to the blog entry that helped me narrow the problem down and provided me with a simple fix:

http://blog.mozilla.org/it/2012/06/30/mysql-and-the-leap-second-high-cpu-and-the-fix/

For the record, should this link ever stop working, he said:

The fix is quite simple – simply set the date. Alternatively, you can restart the machine, which also works. Restarting MySQL (or Java, or whatever) does NOT fix the problem. We put the following into puppet to run on all our machines:

$ cat files/bin/leap-second.sh
#!/bin/bash
# this is a quick-fix to the 6/30/12 leap second bug

if [ ! -f /tmp/leapsecond_2012_06_30 ]
then
/etc/init.d/ntpd stop; date -s "`date`" && /bin/touch /tmp/leapsecond_2012_06_30
fi

His solution was a lot more elegant than mine, which was to simply reboot the system. :) It was also a lot easier to apply prophylactically to our entire fleet.

Wednesday, March 14, 2012

Piecaken for Pi Day

Awhile ago I ran across a flickr picture of a "piecaken." For the uninitiated, a piecaken is a cake with one or more pies baked into it. The version in the link is a two-layer variety, but I realized that a single layer wouldn't actually be too difficult. I also made the mistake of mentioning it at the office shortly after the announcement of the annual pi day celebration where we bring in pies of all shape to share with each other in celebration of this well known mathematical constant. Unfortunately, I mentioned it within earshot of my director, who also happened to be the organizer of this party, and she virtually insisted that I had to bring one. Despite my trepidations, not only in my rather lacking skill-set in this area (I'm not exactly the world's greatest baker), but also whether or not it would taste any good, I set forth to create one.

Goodbye Cable TV...Almost

I've been working for the past few weeks to prepare for the severing of relationships with our current cable TV provider -- signing up for Hulu, building a HTPC (Home Theater Personal Computer, for the uninitiated), etc. Last night, I completed the last few steps -- powered down the TiVo and removed the CableCARDs in it. Then, this morning, I called to cancel my account.

Except I didn't do it.

As I had expected, I was passed over to the "retentions" department and I was given a few options to entice me to stay. What I wasn't expecting, however, were the plans that didn't have any promotional pricing, and pretty reasonable prices at that. I ended up scaling us back to the "bare bones minimum" plan that only gives us the network channels plus a smattering of basic cable channels (Discovery being the only one he listed that we watch with any frequency). This plan is about $21/mo (probably $25 after taxes), which compared to the $98 we had been paying before (plus the $13/month going to TiVo, which is also going to get cancelled in favor of the HTPC), is still a pretty reasonable savings -- about $86 per month, or just over $1000 a year!

I'm not viewing this as "caving in" to their attempts to retain me as a paying customer, and part of me was expecting this to be the end outcome anyway, for the following reasons:

While I have managed to locate online alternatives to almost all of our shows, the ones we like from Discovery do not really seem to be online anywhere, other than paying per episode from Amazon. Plus, Discovery can be one of those great put-it-on-and-forget-about-it channels, for example when I'm sick on the couch.
Part of my plan had included getting the network stations "OTA" (over the air) with an antenna. Unfortunately, I don't yet have an antenna in place, nor do I have the structured cabling installed yet to get the signal from it to the TV. Since this new cable plan has no commitment period, this can act, if nothing else, as a stop-gap until we can get the antenna set up.

So, we shall see how this works....

Thursday, February 9, 2012

XenServer Network Bonding

I've just had a bit of a learning experience with respect to Citrix XenServer and what it refers to as network bonding, and thought I would share here.

All of what I am about to say comes with the disclaimer that I am not a network engineer, but rather a server engineer, so my networking terms may not be 100% accurate.

I've been working to deploy a new XenServer pool here at $WORK recently, and we've been working under the assumption that we would connect it to our new datacenter networking standard, which we refer to locally as a "virtual chassis" -- two in-rack switches "stacked" together connected independently via 10GE to two stacked aggregator switches and network traffic tagged with standard 802.1q vlan tags. In order for this to work as a fault tolerant configuration, each server must have a connection to each of the in-rack switches. On our normal, non-virtualized linux server deployments we have been using standard link aggregation (otherwise known as bonding or NIC teaming) with LACP (Link Aggregation Control Protocol).

According to the documentation, XenServer, which is really just Linux under the hood, supports network bonding, so this should be the same, right?

Wrong.

Rather than using LACP, XenServer actually uses something called "Source Level Balancing," which is based on the standard Linux "Adaptive Load Balancing" method of bonding (See Citrix's KB article CTX124421). The really cool part of this bonding mode is that it requires no knowledge on the switch side of the connection to make it work. Instead, the hypervisor migrates VM network traffic from port to port by sending a "gratuitous ARP" through the new port with the MAC address of the VM to be moved. In an active-active configuration, XenServer monitors the traffic for each of the running VMs, and will rebalance as needed every few seconds (according to the manual, 10 seconds).

So, don't do what I did and configure the switch ports that XenServer uses as LAG groups. Otherwise you'll end up with a bonding mismatch and waste time trying to figure out why you are having weird networking issues.