How to backup and restore vRealize Log Insight master node configuration?

 

Even configured as a cluster vRealize Log Insight does not support high availability in terms of availability of data and availability of all functions and configuration data.

What does it mean exactly?

In a vRLI cluster there are basically two types of nodes:
  • one master node
  • up to 11 worker nodes (as for vRLI 4.5)
What will happen if one (or more) of those nodes fails?
  • worker node

In case it is a worker node, the cluster remains fully accessible but we will not be able to access the data which was stored on this particular node. It might also be, that exactly this node was holding the VIP, in this case the cluster will elect a new node to hold the VIP. But what if this node cannot be restored anymore?

  1. You have a full backup of this node – everything will be fine, just run you restore procedure and your back to business.
  2. You, for some reason, don’t have any backups. For sure, you will loose the data but “restoring” the node is as easy as removing the failed node from the cluster (https://docs.vmware.com/en/vRealize-Log-Insight/4.3/com.vmware.log-insight.administration.doc/GUID-116098CD-6F9D-4FC9-A037-CB2CAE035B29.html) adding a new node to the cluster, you can just use the same name and IP.
  • master node

In case it is a master node, again the same applies: the cluster remains fully accessible but we will not be able to access the data which was stored on this particular node. It might also be, that exactly this node was holding the VIP, in this case the cluster will elect a new node to hold the VIP. BUT you will not be able to access and change the cluster configuration, the status will be unavailable etc.:

But what if this node cannot be restored anymore?

  1. You have a full backup of this node – everything will be fine, just run you restore procedure and your back to business.
  2. You, for some reason, don’t have any backups, nothing, not even a single file. You’re screwed!

But there is good news even if you cannot backup the whole node (maybe it is just too big, or for whatever reason) – just backup the right data to make a master node restore as easy as restoring a worker node.

### to edit ###

These are the steps to backup and restore the master node (minimal version), for a regular full backup, please follow the official VMware documentation (vRLI 4.5) https://docs.vmware.com/en/vRealize-Log-Insight/4.5/com.vmware.log-insight.administration.doc/GUID-FB70EF83-7E6B-4AEE-9522-CD6173F52FA0.html:

  • Backup following files on your master node on a regular basis:
/storage/core/loginsight/config/loginsight-config.xml#number

/storage/core/loginsight/config/node-token

/storage/var/loginsight/apache-tomcat/conf/keystore

which is the target of the symlink:

The restore procedure after the master node is lost:

  1. Spin up a new log insight server with same IP and FQDN as the old master node
  2. Stop the loginsight service (connected via ssh): service loginsight stop
  3. Create the config folder: /storage/core/loginsight/config
  4. Copy all loginsight-config.xml#number and node-token files to /storage/core/loginsight/config/ (Note: “config” folder has to be manually created)
  5. Copy keystore file to /storage/var/loginsight/apache-tomcat/conf/
  6. Check the symlink: /usr/lib/loginsight/application/3rd_party/apache-tomcat-8.5.15/conf -> /storage/var/loginsight/apache-tomcat/conf
  7. Reboot the log insight appliance
  8. Login to UI and checked the cluster status

 

Disclaimer: This method does not replace a proper backup of the system as described in the VMware documentation!

vROps SuperMetric using logical expressions

vRealie Operations Super Metrics are a very flexible and powerful way to extend the capabilities of the product way beyond the OOB content.

There are many blog articles out there explaining how to basically use super metrics but only very few sources gives some examples how to put logical expressions into your formulas. So the question is, how dos this work?

Using some simple examples I am going to explain how the magic of logical expressions work in vROps Super Metrics.

First of all some fundamentals:
  • Super Metric working on a selected object itself, like ESXi cluster in this example, which is just showing the actual metric (we will need soon):
avg(${this, metric=summary|total_number_hosts})
  • Super Metric working on direct descendants of a selected object, in this case ESXi hosts in a cluster, which is counting the powered on hosts:
count(${adaptertype=VMWARE, objecttype=HostSystem, metric=sys|poweredOn, depth=1})
Now, let’s put the pieces together and build a super metric which follows this pattern:
condition ? inCaseOfTrue : elseCase;

Even if this doesn’t mean anything in terms of semantics, the syntax of such an expression might look like this one:

count(${adaptertype=VMWARE, objecttype=HostSystem, metric=sys|poweredOn, depth=1}) == avg(${this, metric=summary|total_number_hosts}) && avg(${adaptertype=VMWARE, objecttype=HostSystem, metric=cpu|usage_average, depth=1} as cpuUsage) > 40 ? avg(cpuUsage ):5

One could translate this formula into that statement:

If all ESXi hosts in a given cluster are powered on AND clusters average CPU usage is greater than 40 THEN show me the average CPU usage, ELSE show me 5