My almost-effortless upgrade experience (and lessons learned) with Elastic 8

Recently I decided to upgrade my home Elastic stack to Elastic version 8. There was nothing urgent driving this decision: everything worked just fine with Elastic 7.17, and there was at least one reason not to upgrade: one external plugin I rely on does not work for version 8.7 (more on that below). In the past, though, I’ve upgraded from every version of Elastic since version, so I felt I had to take the plunge.

Every upgrade I’ve gone through has gotten easier and easier. A brief recap of my prior attempts:

  • 1 to 2: Gave up due to too many errors, and just decided to start over from scratch.
  • 2 to 5: Had some serious upgrade troubles, mainly around some unhelpful error logs for a Ruby logstash plugin, but after a few days of effort got it working.
  • 5 to 6: Had some issues thanks to the .security index being in the wrong format, but was ultimately able to fix it by selectively downgrading then upgrading again. Overall, took a few hours.
  • 6 to 7: No problems at all thanks to the advance work done as suggested by the Upgrade Assistant.

Thanks to that previously-mentioned Upgrade Assistant, which identifies things that need to be fixed before upgrading, I anticipated that the upgrade from 7 to 8 would be fairly straightforward. The upgrade itself was painless: all I had to do was point apt to the Elastic 8.X repos and upgrade over the top. Soon enough, everything was up and running again. I did take advantage of the upgrade to make a few other changes to my setup, which, due mainly to me not reading the documentation correctly, turned out to be more of an adventure than anticipated, but in the end I got everything working correctly. To learn more about the particular issues I ran into, and how to avoid them, read on…

Fun with Data streams

I mainly use Logstash to ingest log streams from network devices, my website, and syslog. I did this with a straightforward Elasticsearch output configuration using an index pattern to create daily indices. Since these are append-only log sources and after a couple of days no logs should ever come in again, this is a perfect use case for Elastic data streams. Combining data streams with lifecycle policies would totally automate my index management: creating new indices, moving to read-only after a couple of days, and deleting them based on my own retention policies.

Naïvely turning on the toggle for data streams in the index template without changing anything else did not work, though. Immediately I got errors that Logstash was unable to send logs to those indices. It turns out that in order to get Logstash to send to datastreams, you need to add action=>"create" to the output configuration. Once I did that, as well as removed the redundant daily index pattern, everything worked as expected.

New template options

Did I mention templates? I used to use legacy templates with multiple layers to create indices: Since you can specify an order with a legacy template, and higher-ordered templates overwrite lower-ordered ones, you can make a low-order generic template with things like shard and replica settings that you want applied to all indices, then get more granular and specify higher-order templates that may have just a few custom fields for one data type for example.

As you can see, they are referred to as “legacy”, which means there’s a more preferred option now: composable templates. These allow you to do much the same thing, but in a different way: you can define a component template with certain fields or settings, then selectively apply them to composable templates. In this way, you get the DRY (Don’t Repeat Yourself) approach of legacy templates.

Since composable templates have a “priority” field, I mistakenly thought they worked similar to legacy templates in that the higher priority template would simply layer over a lower-priority one. Nope: when there are many matching templates, the highest-priority template is the only one that gets applied. Instead of layering on different options like layers of a sandwich, this was replacing the sandwich with a steak.

This caused the biggest problem for the built-in monitoring templates that I wanted to change. My first approach, to create a composable template with just a few settings that I wanted to layer over the built-in system template failed miserably, because it was the only template applied due to me putting a higher priority on it than the built-in template. Instead, I copied the built-in template, then added my settings to the existing field settings, and set that new template to a higher priority. Voilà, I was no longer missing field definitions!

Also: don’t mess with the system-managed legacy templates. I thought I would get cute and delete some of the older ones that had been carried over through a couple of upgrades. While most of these were fine, when I deleted the Metricbeat 7 monitoring indices (like .monitoring-es-7-*, .monitoring-kibana-7-*, etc.) that caused a lot of log spam because Metricbeat really, really wants to keep them around: it tries to auto-create them if it doesn’t see that they exist, and they were conflicting with the new templates I set up. So just leave them be if you upgrade.

Metricbeat updates

I also modified my Metricbeat setup while I was at it, which when combined with the template issues above, really stymied my cluster monitoring for a while until I got everything correct.

Previously, I sent data from Beats like Metricbeat and Winlogbeat to Logstash first, but since Logstash really wasn’t doing anything with the data, I removed that pointless hop and sent directly to the Elastic node. I had a few problems with permissions when attempting to make that change however, mainly because I didn’t fully read the documentation. In particular, you can’t just create the writer role: when it says “This section assumes that you’ve run the setup” they mean it! In order for Metricbeat to create all the templates and dashboards it wants, it needs higher permissions at first. Then, once setup is complete, you can drop to a lower, ingest-only role.

Proper roles created, I then ran into some conflicting information on enabling cluster monitoring. If you are using Metricbeat to monitor your cluster, you don’t need “xpack.monitoring.collection.enabled” set to “true” (again, something mentioned in the documentation). Due to the template issues I wasn’t getting any monitoring data so I thought I had to enable it, and enabling it even worked, but that’s the deprecated, non-Metricbeat way of collecting metrics. Don’t use it.

Final thoughts

Although everything is working as expected after the above troubleshooting, as I mentioned earlier one external plugin sadly has not been updated to work with Kibana 8.7 yet: the Kibana Enhanced Table plugin. This incredibly useful plugin creates what are essentially pivot tables, which I use to better display firewall block events based on a port/rule combo. If this wasn’t my home setup and I truly needed that functionality, I would have upgraded to 8.6 instead of 8.7, since 8.6 is supported by Enhanced Table as of today. Hopefully it will be updated soon enough to be able to use it.

While this is it for my upgrade story, of course like any techie I couldn’t leave well enough alone after a mere upgrade! I had been running Kibana on the default configuration of port 5601 and unencrypted HTTP up until now, and I finally decided that was no longer good enough. I’ll have more to share about how I enabled TLS for Kibana and put it behind nginx in a later post.

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *