Follow

My data ingest is failing

Issue

My data ingest is failing 

Cause

There are various causes of why data ingest fails. It is best to perform the following steps to investigate the error that is causing data ingest to fail:

  • Step 1: Check ingest directory and content ownership
  • Step 2: Check raw events in Elasticsearch via Kibana
  • Step 3: Verify relations generated in HBase
  • Step 4: Check Flume logs

NOTE: These steps are provided as a reference point to help with troubleshooting ingest issues. These steps do not encompass all possible solutions. If these steps do not resolve the issue, please contact Interset support: support@interset.com

Resolution Steps

NOTE: This information is only useful for CSV data ingest using Flume 

Step 1: Check ingest directory and content ownership

If there is no raw index created in Elasticsearch or data is not in HBase, please ensure the ingest file(s) is/are located and accessible by Flume on the STREAM NODE.

  1. SSH to the STREAM NODE as the Interset User.
  2. Type in the following commands to list the ownership of the ingest directory (e.g. /data/auth) and its content:
    • EXAMPLE:
      • list ownership of ingest directory:
        • ls -al /data
      • list content and ownership inside ingest directory:
        • ls -al /data/auth
  3. Ensure the ingest directory and its content ownership is set to the following:
    • user: flume
    • group hadoop
  4. If the ownership of the ingest directory or its content is not set to the above, please type in the following command to set ingest directory (e.g. /data/auth) ownership:
    • EXAMPLE: sudo chown -R flume:hadoop /data/auth

Step 2: Check raw events in Elasticsearch via Kibana

  1. In a web browser, navigate to Reporting UI URL:
    • http<s>://<Reporting_Node_FQDN>/search
  2. Log in with the credentials for your tenant. The default Reporting Admin username/password is:
    • username: admin
    • password: password
  3. In Discover, click the ▼ to select the appropriate index name/pattern. Depending on the data type being ingested, the index name/pattern will differ.
    • EXAMPLE: If ingesting web proxy data, the index pattern will look like:
      • interset_webproxy_rawdata_<tid>
        • NOTE: If no index name/pattern is configured, Kibana will prompt to add one. Please reference Interset documentation for details on how this can be configured.
  4. In the top-right corner, where the time-range filter is defined, adjust the filter to represent the time-range reflected in your dataset.
    • If data does NOT exist, this indicates that Flume did not push the dataset to Elasticsearch
    • If data does exist, this indicates that Flume is pushing the dataset to Elasticsearch. It is best to verify if

Step 3: Verify relations generated in HBase

  1. SSH to the ANALYTICS NODE as the Interset User
  2. Type in the following command to navigate to the /opt/interset/analytics/bin directory:
    • cd /opt/interset/analytics/bin
  3. Type in the following command to load the Phoenix Console:
    • ./sql.sh --action console --dbServer <server>
  4. Once the Phoenix console has loaded, run the following query:
    • SELECT MIN(DATEMINUTE), MAX(DATEMINUTE) FROM OBSERVED_ENTITY_RELATION_MINUTELY_COUNTS WHERE TID = '0' AND DATEMINUTE > TO_DATE('0', 'S');
  5. The output will be similar to the following:
    MIN(DATEMINUTE) MAX(DATEMINUTE)
    2016-04-01 13:00:00.000 2016-05-30 22:58:00.000
  6.  The MAX(DATEMINUTE) should be consistent to the time-range reflected in your dataset.
    • If the MAX(DATEMINUTE) is NOT consistent, this indicates that Flume did not push the dataset to HBase. Please continue to the Check Flume logs section 

Step 4: Check Flume logs

There are several stages defined in a flume configuration, which extract, transforms, and loads data to Elasticsearch/HBase. During these stages, errors may occur which requires investigation in the flume logs.

  1. SSH to the STREAM NODE as the Interset User.
  2. Type in the following to list all Flume logs and output files within the flume log (/var/log/flume) directory:
    • ls -al /var/log/flume
  3. The following Flume logs are generally the logs to look at any errors that may be outputted:
    • flume-interset_<ds>_events_<did>_<tid>_csv_transform.log
    • flume-interset_<ds>_events_<did>_<tid>_es.log
    • flume-interset_<ds>_events_<did>_<tid>_hbase.log
    • flume-interset_<ds>_raw_<did>_<tid>_csv_multiline_extract.log
      • NOTE:
        • <ds> - denotes the data source type ingested (i.e. auth, repo. webproxy)
        • <did> - denotes the data source instance ID the data is ingested to
        • <tid> - denotes the tenant ID the data is ingested for
  4. Each Flume log contains their own respective sets of errors that may be outputted. To view the respective log, please see the example command below:
    • EXAMPLE: less <flume_log_name>.log
  5. In the log file, hit the follow key combination to jump to the end of the log:
    • Shift + G
  6. Below is a sample list of potential errors that may be outputted in the Flume logs:

Applies To

  • Interset 5.4.x or higher 
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk