skarnet.org downtime

Past outages

2018-04-08: VM crash at Gandi. It can happen, no big deal. Except that alyss does not boot back. It was a Sunday, so no help from tech support. The next day, tech support takes a couple hours to answer but points me to the right direction: the kernel boots, but can't find the rootfs. Investigation shows Gandi changed the way their Xen PV installation presents hard disks to the guests, so my grub configuration was obsolete. Problem solved after about 24 hours of downtime. My main gripe here is that I cannot make sure that it doesn't happen again: if disk configuration changes again, I have to modify the grub.cfg by hand, unless I install the whole gandi-vm-config machinery that is a Python monster and a way for Gandi to backdoor your machine as they please - which I obviously won't do.
2016-11-05: scheduled downtime for maintenance: alyss was switched from Gandi's Xen hypervisor infrastructure to their new KVM hypervisor infrastructure. I could not make the "boot on a raw disk and have your custom kernel" feature work, so it's still using a stock Gandi kernel for now.
2013-09-02: switch from antah to alyss, a virtual server at Gandi. A few hiccups while fixing the last bugs, but no major downtime. Complete switch to a homemade distribution. No more hardware failures, no more distribution failures, no more OpenSSH failures. The future is bright!
2007-07-20: Antah hardware failure. For several reasons, there's one month of downtime. My apologies. Read the story here.
2006-02-28: Power failure at RedBus. antah doesn't boot when the power comes back. Analysis shows that the last Debian upgrade has messed up lilo configuration, and the kernel can't be found. Sigh. And they ask why I don't trust Linux distributions. Lilo installed manually, problem fixed. Kernel upgraded to 2.6.15.4.
2005-03-04: antah's main disk has been having major problems for a few days. I go to RedBus, take the disk home, and dump it onto another one before it's too late. The machine is back up on 2005-03-06, 9h50 (GMT+1). Kernel upgraded to 2.6.11.
2004-08-16, 17h (GMT+2): unable to login, so I immediately go to RedBus and reboot. I can then login and analyze. Problem: sshd didn't like /dev/pts/100. Linux developers pretend it's a userland problem and OpenSSH developers pretend it's a Linux kernel problem. Great. Kernel upgraded to 2.6.8.1, I'll try to write a workaround to the /dev/pts/100 problem before it arises again.
2004-05-04, 9h50 (GMT+2) to 2004-05-07, 16h00 (GMT+2): scheduled ISP change. The whole story can be read here.
2004-02-25: 13h30 - 18h30 (Paris time, GMT+1): scheduled kernel and boot system upgrade. No problems.
2003-08-27: the whole skarnet.org site was down, to support the online demonstration of the FFII against software patents. Downtime from 2003-08-27 at 03:00 GMT+2 to 2003-08-28 at 15:00 GMT+2. Kernel upgraded to 2.4.22, init system upgraded.
2003-02-19: 6h - 8h (Paris time, GMT+1): scheduled electrical upgrade. ClaraNet warns their users only a day in advance, pretending that the previous upgrade was incomplete and must be fixed immediately. The outage actually starts at 6:10 and ends at 10:35.
2002-12-12: 6h - 8h (Paris time, GMT+1): scheduled electrical upgrade. Kernel upgraded to 2.4.20. No boot problems.
2002-10-08: Power outage at ClaraNet. Antah doesn't boot properly when the power comes back. Cause: modutils failure - hardcoded /sbin paths in binaries and in kernel. Sigh. Upgraded to 2.4.19, without module support; modutils thrown out.

Scheduled outages

None planned for the moment.