Follow

Spark /work directory using lots of disk space

By default, Spark will store its logs from executions, as well as the associated JAR file, in its <spark>/work directory (e.g. /opt/interset/spark/work). This can build up to use large amounts of disk space over time, as the JAR file is roughly 40MB and will be replicated at least once per day.

This behaviour can be changed by modifying the /opt/interset/spark/conf/spark-env.sh file as follows:

 

  1. Create a backup of the spark-env.sh file. 
  2. Open the file in a text editor (e.g. vi) and locate "SPARK_WORKER_OPTS
  3. Immediately below this line, add:

    SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.appDataTtl=172800"

    This should enable work logs cleanup, and will retain logs for no longer than 48 hours, with a default check time of every 30 minutes.

  4. Restart Spark by running /opt/interset/spark/sbin/stop-all.sh, and then /opt/interset/spark/sbin/start-all.sh

 

If you wish the change the actual location of the work directory, this can be done with the use of the "SPARK_WORKER_DIR" environment variable.

More information on both of these topics can be found at https://spark.apache.org/docs/1.3.1/spark-standalone.html.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk