Ingest (Flume) may stop due to modified files or duplicate filenames

In the following scenarios, the Flume service (which does data ingest) may halt.

- A file currently being ingested is modified.
- A file with the same filename as one previously ingested is placed into the source directory (e.g. p4audit.log is placed into /tmp/ingest, is ingested and renamed to p4augit.log.COMPLETED, and another file named p4audit.log is then placed into /tmp/ingest).

You can confirm if this behaviour is hit by tailing/greping the /var/log/flume/* log files for the following message:

FATAL: Spool Directory source dirSource: { spoolDir: /tmp/ingest }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.

To mitigate this issue once it has been hit, go into the Ambari console (e.g. http://<MASTER>:8080/) and restart the Flume service.

It is recommended that files for ingest be created with a timestamp as part of their name to prevent the possibility of duplicate filenames.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request