Slight change in server ping metrics method

This post documents a change to the method of collection of ping metric data to give a more accurate measure of ping metrics.

Two days ago, on 19 December 2023, a public Hostfurl status page was launched on https://status.hostfurl.com.au.

There are four status bars to indicate status level of various services: two are for ‘Email and Web’ service (one in Australia, the other in Europe), one is for ‘Billing and Support’ services and the last for DNS services. Bars do not necessarily represent a single server.

There are also two system metrics graphs. Currently they show ping metric from a server in Australia to another server in Australia and from the same server in Australia to another server in Europe.

The ping measurement method was changed on 21 December 2023.

The previous method was to issue one ping every ten minutes, launched from cron, to both servers, record the ping time if the ping command succeeded and pass the metric on to the status site. If the ping fails then an immediate alert is issued instead. No alerts were raised (other than with tests).

The ping time to Europe is around 293 milliseconds and the ping time to Australia is around 2 milliseconds. Nothing unexpected, except there have been occasional odd occurrences when the ping time to Australia is over 10 milliseconds and makes the metric moving average move up and down misleadingly.

The current method now is to issue three pings, discard the first ping measurement and average the second and third measurements. So, if you notice the ping metrics for Australia are more even and under two milliseconds, this is the reason. No other changes have been made.

The cron job executes simple bash scripts which use sed and awk commands for extracting and processing data. Grep could be used instead of sed. If ping succeeds, sed extracts the three ping times, awk performs the average of the last two ping times and curl passes the information on. What, if say, only one of the three pings succeeded? We have tested this scenario. It looks like the ping command will say it was successful but the current scripting, with awk, will indicate the second and third ping times are both 0. The average of these, again 0, will get passed through as the ping metric! Yikes! Needs improvement. Easier to replace with a perl script.