You should use Elasticsearch Client nodes as much as makes sense. Wait, what is a client node? Quite simply it is an Elasticsearch node with no master duties (“node.master: false”) and with no data (“node.data: false”). Why? Better overall performance and scaling potential. Here is why in more detail.
When you read or write to Elasticsearch (..Kibana and Logstash) the application will send the command to one Elasticsearch node. That Elasticsearch node will then distribute that command, wait for the response from the rest of the cluster, then respond to the client application. If it is a read query from Kibana the Elasticsearch node will distribute the query to the individual Elasticsearch nodes that contain the needed shards, those other Elasticsearch nodes will run the query and respond to the originating Elasticsearch node, and then the originating Elasticsearch node will reassemble the query and respond to Kibana. A write happens in the same way just in the reverse, the Elasticsearch node that received the write query from Logstash will be the node responsible for distributing it to all the respective nodes, assembling the acknowledgement responses from the cluster nodes, and responding to Logstash. Best case you have an increased load across all your data nodes that is distributed pretty evenly, but is using up resources on the data nodes that could be used for other purposes. Worst case you overload one or a handful of data nodes and you will have performance issues even though most of your Elasticsearch data nodes do not have any resource contention. One or a couple overloaded Elasticsearch data nodes can slow the whole cluster down.
Admittedly, writing directly to a data node on smaller ELK environments that are only handling tens to a couple hundred logs per second of ingest has relatively minimal effect. The overhead of doing it this way just doesn’t add up to a significant amount. Once you start ingesting thousands to tens of thousands or more of logs per second, the overhead you are putting onto the Elasticsearch data nodes will slow everything down.
Don’t put multiple Elasticsearch data nodes in your Logstash config to round robin write to, and don’t run your Logstash/Kibana through a load balancer backed by the Elasticsearch data nodes. Instead use a Elasticsearch Client node. This offloads all of the overhead of servicing read and write queries from your Elasticsearch data nodes and in turn allows your Elasticsearch data nodes to be able to handle more queries.
There are two ways to deploy client nodes and each have their benefits and drawbacks.
The first is to install dedicated client nodes installed on a server that does nothing else. These would typically be load balanced and Kibana/Logstash/xyz_app would connect to the Elasticsearch Client nodes through a load balancer pool.
Second is to install a client node on each of the Kibana/Logstash/xyz_application servers directly and each of those services will connect to their local Elasticsearch install. This has the added benefit that since everything connects to localhost:9200 for access to Elasticsearch, port 9200 can by default be blocked on all Elasticsearch servers for tighter security.