Agentless sensor data ingestion with Splunk via HEC

It’s been 1 month since I joined Splunk and I’m enjoying every single minute; nice team, great company and, overall, cool technology. Every day I find something new delighting me and cannot wait to play around with it.

From the technical stand point, being a sales engineer is not all about playing with technology but dealing with some other non-technical duties. However, I’ve managed to get some time and link something I did in the past with something I’m doing currently, which I’ll forget in the future 🙂 and that’s the reason why I’m starting this blog post: telling my future self what I’m doing now.

This is the deal:

  1. At Hortonworks I had the privilege to work with awesome people in different areas and, as I’ve mentioned in previous posts, had access to multiple demo environments. Hortonworks’ commitment to open source turned out in many many free collaterals ready to be used by the rest of the world.
  2. In the Data in Motion side, there was a data generator making up sensor data related to truck fleets. This sensor data was ingested, analyzed on the fly, persisted as time series data and finally visualized. All relying on open source frameworks (NiFi, Storm, Kafka, Druid, Hive, HBase, Superset, …).
  3. I’ll grab that data generator and will put it into a Docker image in order to generate fake sensor data in an easy way. Here is the result in my github repo; as usual all the details are documented in the Dockerfile file itself and also published on Docker Hub.
  4. At Splunk I’m realizing how easy it is to ingest, index, store and visualize data; all with one single platform, that’s it!
  5. As with the data generator, I want a simple Splunk instance easy to deploy; nothing fancy nor elaborated, just the core functionality to see it in action. It turns out these guys at Splunk have done part of the work for me, and there are official Docker images ready to be used by anyone. I’m talking about an easy way to use the trial version of Splunk Enterprise… awesome!

Now that all “the actors” are introduced, it’s time to provide you with the commands you have to execute to make everything work:

  1. As there will be two Docker containers interacting with each other, a virtual network will make the interaction between both containers much easier; let’s call it datasim_net:
    docker network create --ipam-driver default \
                          --subnet=172.28.0.0/16 datasim_net
  2. Now it’s time to start the Splunk container and attach it to the datasim_net network:
    docker run -dt --rm -p 8000:8000/tcp -p 8088:8088/tcp \
               --hostname splunk --domainname datasim.raulmarin.me \
               --network datasim_net --name splunk \
               -e "SPLUNK_PASSWORD=_StrongP4ss" \
               -e "SPLUNK_START_ARGS=--accept-license" \
               splunk/splunk:latest

    It takes some seconds to be ready, check logs to identify when it’s done.

  3. Let’s create the HEC endpoint in Splunk; the data generator will send events to it and Splunk will ingest, index and store them right away… nothing else is needed! isn’t this super cool!hecWe need to write the Token Value down because we’ll use it for the next step.
  4. Finally, I’ll start the data generator and will let it know where the Splunk HEC point is available and what Token Value has to be used via two environment variables:
    docker run -dt --rm \
               --hostname trucking --domainname datasim.raulmarin.me \
               --network datasim_net --name trucking_data \
               -e HEC_URL='http://splunk:8088'\
               -e HEC_TOKEN='174580a0-e37d-44aa-9d11-6d8434343e48' \
               raulmarinperez/trucking_data_sim
  5. Suddenly, you’ll see sensor data available in Splunk ready to be analyzed and visualized:sensor_data

Last but not least, the video showing the whole process live plus some additional steps to extract fields, visualize a very simple metric in real-time and add it into a dashboard (simplest dashboard ever though):

Happy memory refreshing :p

Raúl

One thought on “Agentless sensor data ingestion with Splunk via HEC

  1. Pingback: NetFlow traffic ingestion with Splunk Stream and an Independent Stream Forwarder – Dear future self

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s