This article will discuss guidelines on tuning your Interset 5.x environment, which will assist in improving stability and reducing the run time of the Interset Analytics jobs.
- YARN memory usage should generally not exceed ~85-90% of its available maximum in order to account for spikes in requests/overhead.
- YARN sees its available resources as an amalgamation of resources on all managed nodes. While you may have 6 Compute nodes, potentially with varying hardware, YARN will simply see X number of CPU cores, and Y GB of RAM as a pool that it can allocate containers into.
- HBase tuning recommendations will have no affect in environments with only a single Region Server.
- The impact seen on run time of Analytics will be far more profound with large(r) data sets. For example, these tuning adjustments have had scenarios where run time for a data set was reduced from ~40 hours, to ~26 hours, whereas on a data set where analytics takes only 15 minutes to complete it is possible that these tuning changes will have no appreciable affect.
- Remember that resources on the Interset Compute nodes are shared between HBase and YARN/Spark, so the memory (and CPU) overhead from HBase should be considered in your tuning settings.
- Keep in mind that these are guidelines based on our experience, and are not necessarily the definitive resource for the tuning of these components. Performance can, an likely will, vary depending on your specific use case and data set.
HBase uses the concept of Regions for distribution of data across its Region Servers. Each of these Regions makes up a section of data that may or may not entirely contain an entire Table of data. With HBase's default load balancing, it is possible that an entire Table, or a majority of one, will be stored on a single Region Server instead of being split across all available Region Servers. In practice, this means that all of the queries being run against HBase referencing that Table must be run on that single Region Server - this is both inefficient and can cause stability issues. The term for this occurring is called "hot spotting".
With this in mind, we recommend the following parameter be added to Custom hbase-site in Ambari:
This parameter will ensure that each Table is evenly distributed across Region Servers, as opposed to HBase only balancing its Regions. While Region balance is important, for our use case Table balance is even more important.
This should provide the most consistent run times, and will ensure that the distribution of Tables across the Region Servers stays consistent even after multiple Analytics executions.
Note that there is a similar parameter, that may be the preferred approach in the future, called hbase.master.balancer.stochastic.tableSkewCost. We are investigating this and this document will be updated once we have concrete recommendations using it.
For YARN we are primarily concerned with two parameters:
- Minimum Container Size (Memory) (or yarn.scheduler.minimimum-allocation-mb)
- Maximum Container Size (Memory) (or yarn.scheduler.maximum-allocation-mb)
When a Spark job is run, it will execute within a YARN container that has a minimum and maximum memory size governed by this parameter. Each Spark Executor and driver will run within a single YARN container, so ensuring that the tuning parameters between these two services are balanced is very important. The value for the Maximum Container Size will be dependant on the Memory allocated for all YARN containers on a node setting, which is dependant on the amount of memory available on your system. Given that this is variable, we will use abstract values or examples in this article.
It is important also to properly set the value of the Minimum Container Size as this is the smallest size that a container will be created at. Note that since the Spark Driver requires its own YARN container, the minimum size should be roughly equal to the Spark Driver Memory (discussed below) to avoid allocating resources that will not be used.
For Spark we are concerned with several parameters:
- spark.executor.memoryOverhead (set in Advanced spark-defaults in Ambari)
- spark.executor.memory (set as executorMem in interset.conf)
- Number of Executors (set as numExecutors in interset.conf)
- Executor Cores (set as executorCores in interset.conf)
- Spark Driver Memory (set as driverMem in interset.conf)
- parallelism (set as parallelism in interset.conf)
As noted above, the Spark Executor will run in a single YARN container, so the following calculation is used:
spark.executor.memory + spark.executor.memoryOverhead <= yarn.scheduler.maximum-allocation-mb
Breaking from the abstract values, we have generally found that a sufficient value for spark.executor.memoryOverhead is 1GB (or 1024 for the parameter value). This may vary depending on your data set, and by and large is a trial and error process.
spark.executor.memory should be set to the remainder of the YARN container size. For example, if your maximum container size is 16GB, and your spark.executor.memoryOverhead is 1GB, the spark.executor.memory can be set to 15g (15GB).
The Number of Executors is the total number of Spark Executors (and as such, YARN containers) that will be spawned across all Compute nodes in the environment. If this is set to 6, and you have 3 compute nodes, you should see 2 Executors on each node (assuming available resources). Note that you may not always see equal allocation.
Executor Cores is how many cores each Spark job will try to use. This setting is dependant on the Number of Virtual Cores value within YARN. Effectively, the formula for this is numExecutors * executorCores <= # of Compute Nodes * Number of Virtual Cores. Note that this will also effectively govern the number of Spark Executors that can be spawned, as you may use all available Virtual Cores available to YARN before reaching your maximum number of Executors.
Spark Driver Memory can generally be set to 2g. In nearly all cases we have seen this to be a sufficient value in our testing.
Parallelism, as outlined in interset.conf, should be set to 2-3x the number of CPU cores available across all Compute nodes. For example, if you have 3 compute nodes with 16 CPU cores each, this value should be set between 32 and 48.