Follow

How can I tell if data is being ingested

Question

How can I tell if data is being ingested? 

Summary

This "How to" article provides the steps on how to tell if data is being ingested. The high-level overview of steps are listed below:

  • Step 1: Check ingest directory
  • Step 2: Check Flume logs
  • Step 3: Check Elasticsearch
  • Step 4: Check HBase

NOTE: It is only possible to confirm that data for a historical (or, static) dataset has been fully ingested. For streaming data, ingestion is continual and as a result, does not have an end. You must also know the data type being ingested, in addition to the number of events that is in the dataset being ingested.

The following nodes will be accessed (via web or SSH):

  • REPORTING (web)
  • STREAM (SSH)
  • ANALYTICS (SSH) 

Steps

NOTE: This information is only useful for CSV data ingest using Flume 

Step 1: Check ingest directory

The first step is to confirm if the dataset has been read from the ingest directory. Flume will mark a CSV file as “.COMPLETED” to show the dataset is read. Please follow the steps below to confirm if this has occurred:

  1. SSH to the STREAM NODE as the Interset User.
  2. List the contents of the directory where data is stored (eg. /data/auth) using the following command:
    • EXAMPLE:
      • ls /data/auth/
  3. Verify that “.COMPLETED” has been appended to the CSV filename:
    • EXAMPLE:
      • csv.COMPLETED

Step 2: Check Flume logs

Confirm if the dataset has moved to the next Flume ingest process. Please follow the steps below to confirm:

  1. SSH to the STREAM NODE as the Interset User.
  2. Type in the following command to navigate to the /var/log/flume directory:
    • cd /var/log/flume
  3. Flume creates several .log and .out files in the directory. There are 3 files that need to be checked in order to verify if data is extracted and ingested properly. Those files are:
    • interset_auth_raw_<did>_<tid>_line_extract.out
    • flume-interset_auth_raw_<did>_<tid>_line_extract.log
    • flume-interset_auth_events_<did>_<tid>_csv_transform.log
  4. For each file, type in the following to view the file:
    • sudo less <flume_file_name>.<extension>
  5. In the log/out file, hit the follow key combination jump to the end of the log:
    • Shift + G
  6. In the .out file, look for the following line:
    • “Interset Data Gateway 5.5.0.181
    • Starting…
    • Rolled file to /opt/interset/sampledata/authentication/sample_auth_data.csv.COMPLETED
    • All data read.”
  7. In the .log files, look for the “EventPutSuccessCount”. This value should keep incrementing until data ingested completed. 

Step 3: Check Elasticsearch indices

To further validate if data is being ingested properly, verify Elasticsearch for a raw index. Please follow the steps below to verify: 

  1. SSH to the ANALYTICS NODE as the Interset User
  2. Type in the following command to list all indices created in elasticsearch:
    • curl -ks -X GET http<s>://<SEARCH_NODE_FQDN>:9200/_cat/indices?v
  3. Locate the docs.count column and the corresponding interset_<ds>_rawdata_0-<date>-000001 raw index.
    • NOTE: <ds> denotes data source (i.e. auth), date denotes the date of the ingest date (i.e 2017-12-20)
  4. The value listed for the docs.count should increase until all data ingested is pushed to Elasticsearch.

If no index is created from the dataset ingested, please reference the “My data ingest is failing” KB article.

Step 4: Check HBase counts

Verify if ingested data is successfully pushed to HBase. Please follow the steps below to verify:

  1. SSH to the ANALYTICS NODE as the Interset User
  2. Type in the following command to launch the phoenix sql console:
    • phoenix-sqlline
  3. Once phoenix-sqlline loads, type in the following command to return the number of rows added to the specific table:
    • SELECT SUM(COUNTS) FROM OBSERVED_ENTITY_RELATION_MINUTELY_COUNTS WHERE TID = '<tid>';
      • NOTE: <tid> denotes the tenant ID the data ingested is associated with.

If there is no data in HBase from the dataset ingested, please take a look at the “My data ingest is failing” KB article.

Applies To

  • Interset 5.4.x or above

 

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk