Distributing an Ingest Across Multiple Stream Nodes

Parallelism can be achieved through Splunk search queries by appending to the searchFilter;

`| where floor(_time/60)-(N*floor(floor(_time/60)/N))=M` where `N` is the number or parallel agents, and `M` is the agent identifier (ranging from 0 to N-1)

This is basically just combining a conversion of epoch to minutes [floor(_time/60)], and a modulo function [a-(n*floor(a/n) where 'a' is the input, and 'n' is the divisor]

For example, to run 2 Agents, you can have one Agent capture only even minutes with the following added to the searchFilters config:
`| where floor(_time/60)-(2*floor(floor(_time/60)/2))=0`

You'd also want to add the following to the other Agent, to capture the odd minutes.
`| where floor(_time/60)-(2*floor(floor(_time/60)/2))=1`


Note that each of these configurations will require a separate config group within Ambari so that you don't end up with the same machines running duplicate configurations.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request