================
 An okay Sunday
================

Today goes pretty good for me so far, ignoring awful background things
(including that there's still a war going on, Internet access is
crippled more and more, apparently recently TSPU temporarily messed up
IPsec around here, possibly in the attempt to block more VPNs).

I didn't sleep well, but that was because I was rather excited about
possibly finally setting a high-availability cluster at work: things
keep breaking there (malfunctioned a bit today, too), the users are
unhappy, so the managers asked me to look into increasing reliability,
which is something I'd gladly do. I started by asking fellow
programmers to compose lists of services they develop and maintain
(composed such a list myself as an example, and an UML deployment
diagram with data flows too, in a few hours), and the managers -- for
lists of past major issues. Haven't received anything in almost 2
weeks though, and asked managers about reliability requirements on top
of that. But I am familiar with some parts of the system (those that I
maintain, interact with, or helped to debug), as well as some of the
issues, and can guess that it's best to minimize downtime of all the
services as much as possible with reasonable effort and hardware.

So now I'm considering options: eyeing Pacemaker, PAF for PostgreSQL
with streaming replication (or maybe pgpool-II), DRBD to handle the
flow of files uploaded via FTP, possibly with GFS2 and in an
active/active configuration (with that fancy multicast + sorting out
on hosts); configuring failover of systemd services should be
easy. Maybe could do DNS-based load balancing (and hopefully a kind of
failover) too, with clusters in different places and behind different
addresses.

Though before all that, will have to attempt to poke everyone to
simplify the system, since currently the issues tend to arise on a
long path through which data goes, composed of hacks, legacy bits,
not-quite-ready-though-in-development-for-a-long-time things that
should replace the legacy bits, and so on: sorting it out should
increase reliability by itself, and such a messy system would be hard
to set for redundancy.

So, that's the stuff I'm rather excited to play with, and somewhat
hopeful that the system at work will be simplified on top of that.

Other than that, did some physical exercises today, sorted out some
clothes, did a bit of cleaning and laundry, had a nice steak for
lunch. Now it's sunny outside, which is fairly uncommon for winter
here, so it is relatively notable.

Hopefully this day will stay about as nice.


----

:Date: 2023-02-12