[ Back to Top ]
Installation
The idea for someone setting up the system...
- Install the Walrus package from your favorite source
- Configure a database somewhere (MySQL? SQLite?)
- Base configuration of Walrus database config file and authentication
- Install Walrus database in above
- Install web front-end somewhere
- This is important, as this lets someone setup Apache auth any way they like, and Walrus doesn't have to deal with it.
- Of course, maybe we want to, if we want to allow multiple types of people access to the UI?
- Start master daemon running on machine
Configuration
Everything can now be configured through the web UI... everything except authentication credentials and the database location. Those are hardcoded in the configuration file that we distribute out to the cluster.
Some typical configuration will have to be done first to decide what modules/plugins you want. I'm going to expect that most everybody will use every component of Walrus, but see, you can ignore parts of it by not configuring it. If you don't want to use the notification system, don't configure any checks for it. All good.
New Machine
Adding a new machine to the cluster should be as easy as typing in the name and then specifying what roles to give the machine. For example: you type db01, and select 'database' and then specify what cloud the machine is in, 'main-colo' in this case. Since you have previously configured the roles, that should be all you have to do.
Walrus will now schedule the machine to be inherited. That is the action taken when a new machine needs to be configured for Walrus. We assume that the package is already installed and the daemon is running as root in 'new machine' mode. The first step is then for Walrus to feed it the configuration file (with the authentication information) so that it is no longer wide open.
Once the basic configuration is done, the machine can now hang out until it's time to be given orders.
Configuration Management
So the workflow for most tasks related to configuration management is going to be related to checks and actions. For example, maintaining or installing packages:
- Check Package Installed: foo, version >= 2.3
- If true, Do Nothing
- If false, Install Package foo
- Schedule: 60 minutes
When you define a check, you pick one of the checker plugins from the list. The plugin defines various input arguments (in this case, the package and version requirement) and then defines an output (in this case, a boolean).
You then are able to choose actions to taken given those results. For now, true does nothing, and false installs the given package. (To be technical, it schedules a command to install the package.) Either way, this check is then marked as completed for the next 60 minutes, at which point it runs again.
You can designate a check as a one shot or repeating. Typically they will repeat, as you want to ensure that things don't get messed up on the servers.
Monitoring
This is very similar to the above. You designate a check, then you designate the actions to take depending on the result. Of course, you often don't have to designate any actions, since the checks that are designed for monitoring are going to return a state value.
Checks that are being used for monitoring have their state values fed into the monitoring system. This is used to send out alerts with that information, if something needs to happen.
However, since you can hinge actions on the checks, you CAN try to take corrective action when a certain alert happens. For example:
- Check Disk Space /, warn < 15%, critical < 5%.
- If OK, Do Nothing
- If Warn or Crit, Action: Try to Free Space, Reschedule.
- Schedule: 60 minutes.
This instructs the system to run the disk space check on / every 60 minutes, and consider 15% as the 'start warning at' point, 5% for critical. If either state is encountered, it schedules an action that is defined (presumably a script of some sort that runs). When this script has been executed, it reschedules the check.
Only when the rescheduled check has failed will it then alert. Which works out to a lot of saving time for your admins if you can setup little scripts to take care of random situations that happen every once in a while, but still gives you the peace of mind knowing that if there is a problem, it will still be alerted on.
(And note that if the action does not complete in a reasonable amount of time, the alert goes out anyway.)
