By default, Interset suggests running Analytics once every 24 hours. It is, however, possible to increase the frequency of this execution depending on the analytics run time in your environment. We do this by calculating the complete run time of the Analytics applications within Spark (completion time of final application - start time of application 1), and modifying the cron task accordingly.
We recommend that there be a gap between executions of at least 100% of the Analytics run time in order to prevent a pile-up of executing jobs (see this article for more information). This means that if your Analytics execution takes 1 hour to complete, we would recommend it be scheduled to run no more often than every 2 hours.
To determine the run time of Analytics, perform the following steps:
- Open a web browser.
- Navigate to the Ambari console (e.g. http://ambarinode:8080).
- Click on "YARN" on the left-hand side of the screen.
- Under "Quick Links" select "Resource Manager UI". If you have multiple Resource Managers, select this from the "active" node.
- Note the "FinishTime" of the most recent "PhoenixToElasticSearchJob: WORKING_DAYS" application (e.g. "Thu Aug 25 00:49:03 -0400 2016").
- Note the "StartTime" of the most recent "com.interset.analytics.aggregation.EntityMappingJob" application (e.g. "Thu Aug 25 00:00:00 -0400 2016").
- Calculate the difference between the StartTime of "com.interset.analytics.aggregation.EntityMappingJob" and the FinishTime of "PhoenixToElasticSearchJob: WORKING_DAYS" to determine the complete run time of the Analytics. In this example, this works out to 49 minutes and 3 seconds.
In the example outlined above, we would suggest that Analytics be scheduled for no more often than every 2 hours given the proximity to the 1 hour mark. Note that this run time is for example purposes, and may vary considerably based on your cluster's hardware, data volume, and data density.
To modify the execution times of Analytics, you will need to edit the crontab of the Spark user on the machine where Interset Analytics is installed. To do so, perform the following steps:
- Log in to the machine via SSH as root, or any user with sudo privileges.
- Chance user to the Spark user (e.g. sudo su - spark).
- Run crontab -e to edit the cron jobs.
- Modify the Analytics job accordingly, for example if Analytics were to run every two hours the entry would read as "0 0,2,4,6,8,10,12,14,16,18,20,22 * * * /opt/interset/analytics/bin/analytics.sh /opt/interset/analytics/conf/interset.conf"
- Commit the change and quit (note: crontab effectively uses vi as an editor, so exit normally (e.g. escape -> :wq).