Problems Database, Uptime Robot, and Logging Overhaul
1.7.0 brings the first release of the new Problems Database functionality, developed by our partners at Amazee Labs, along with the first couple of integrations. We’ll talk more shortly about the great potential for the Problems Database, but for now we’ve included the ability to collate Image vulnerability scans from Trivy (integrated with our Harbor install) and parse site audit and remediation activity generated by Drutiny).
We’ve also done a heap of work behind-the-scenes on a couple of other subsystems; the logging ecosystem to date has utilized a fluentd-based log collector to forward logs to our internally-hosted ElasticSearch, and then make them available to users via Kibana. With the work done to support Kubernetes, we’ve needed to overhaul this process, and we’ve also taken the opportunity to extend it a little. We now utilize a fully-featured logging operator to collect, enrich and transfer the logs, and the system is now more extensible, customizable, and scalable, giving us (and potentially end-users) more control over log destinations, formats, and retention policies.
The third area of updates has to do with our service monitoring. To date, we have used Statuscake to provide us with real-time monitoring and alerting of site status across all of our clusters. As we’ve grown, and complexity has increased, it has been getting harder and harder for us to manage this integration ourselves. Again, we’ve selected an Ingress Monitor Controller to automatically watch all of our ingresses (routes) and add/remove monitors as required. We’ve also assessed all the various monitoring options available, and we have selected Uptime Robot as being the most suitable for our needs — at this stage, it won’t mean much to end-users, but as we make more features available, we’ll be sure to let you know.
API & Authentication subsystem
Admin & User Interfaces subsystem
- Active/Standby transitions are now better managed through the UI — actioned from the Standby environment, with a confirmation prompt. We’ve renamed the system used to restrict the ability for a cluster to run specific tasks to blocklist (from blacklist) as it aligns more closely with our position against racism, and we believe that even small changes like this can make a difference.
Build & Deploy subsystem
- The Active/Standby process has now been updated to also work on Kubernetes as well as Openshift. The Harbor integration with Lagoon has been reimplemented, to allow the build process to cooperate better. We’ve also done some tweaking to the Helm templates used to deploy projects into Kubernetes.
Logging & Reporting subsystem
- Whilst most of the focus has been on the logging overhaul, we’ve made some improvements to the current logs forwarders to make them more resilient and reduce the risk of logs accumulating and getting dropped. The Lagoon logging upgrades we’ve made for Kubernetes have also been backported to Openshift.
Base Images & Testing subsystem
- We’ve added in a couple more tidyups here - pinning some more Alpine versions to 3.11 whilst we evaluate 3.12, fixing an interesting twig render cache issue, and implementing a centralized versioning system for the PHP tools we add into images (Drush, Drupal Console, Composer & NewRelic). Also, Varnish will no longer cache large (10MB) files, which could potentially fill varnish cache — these files are able to be served directly from NGINX.
Documentation & Examples / DX subsystem
- After realizing that a number of users (including some of our team) were using a user-contributed PR to guide them through local k8s setup for Lagoon, we figured that we needed to merge it — thanks Salvo! We’ve also started to move some docs around to better match GitHub’s expectations of us!
Automation, Services & Helpers subsystem
- The usual tweaks here to scripts and tools we regularly use to keep sites running and manageable. We’ve also made modifications to our auto-idler to ensure that the cron jobs persist across restarts — running them inside the auto-idler pod, instead of as cron job pods.
- We’ve made a couple of updates here to the Trivy vulnerability scanner bundled with Harbor — updating to include a fix for empty composer files that was impacting the scanning of Drupal sites, and a chance to get it running more smoothly on Openshift.
Missed the last round of updates? See what changed in Lagoon 1.6.
See the full release notes here.