In the following scenarios, the Flume service (which does data ingest) may halt.
- A file currently being ingested is modified.
- A file with the same filename as one previously ingested is placed into the source directory (e.g. p4audit.log is placed into /tmp/ingest, is ingested and renamed to p4augit.log.COMPLETED, and another file named p4audit.log is then placed into /tmp/ingest).
You can confirm if this behaviour is hit by tailing/greping the /var/log/flume/* log files for the following message:
FATAL: Spool Directory source dirSource: { spoolDir: /tmp/ingest }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
To mitigate this issue once it has been hit, go into the Ambari console (e.g. http://<MASTER>:8080/) and restart the Flume service.
It is recommended that files for ingest be created with a timestamp as part of their name to prevent the possibility of duplicate filenames.
Comments