analytics - Nocturnalknight's Lair

Building a Log-Management & Analytics Solution for Your StartUp

By Ramkumar Sundarakalatharan | November 5, 2019 | Comments 0 Comment

Building a Log-Management & Analytics Solution for Your StartUp

Background:

As described in an earlier post, I run the Engineering at an early stage #traveltech #startup called Itilite. So, one of my responsibility is to architect, build and manage the cloud infrastructure for the company. Even though I have had designed/built and maintained the cloud infrastructure in my previous roles, this one was really challenging and interesting. Due in part to the fact, that the organisation is a high growth #traveltech startup and hence,

The architecture landscape is still evolving,
Performance criteria for the previous month look like the minimum acceptable criteria the next
The sheer volume of user-growth, growth of traffic-per-user
Addition of partner inventories which increases the capacity by an order of magnitude

And several others. Somewhere down the lane, after the infrastructure, code-pipeline and CI is set-up, you reach a point where managing (read: trigger intervention, analysis, storage, archival, retention) logs across several set of infrastructure clusters like development/testing, staging and production becomes a bit of an overkill.

Enter Log Management & Analytics

Having worked up from a simple tail/multitail to Graylog-aggregation of 18 server logs, including App-servers, Database servers, API-endpoints and everything in between. But, as my honoured colleague (former) Mr.Naveen Venkat (CPO of Zarget) used to mention in my days with Zarget, There are no “Go-To” persons in a start-up. You “Go-Figure” yourself!

There is definitely no “One size fits all” solution and especially, in a Start-up environment, you are always running behind Features, Timelines or Customers (scope, timeline, or cost in conventional PMI model).

So, After some due research to account for the recent advances in Logstash and Beats. I narrowed down on the possible contenders that can power our little log management system. They are,

ELK Stack — Build it from scratch, but have flexibility.
Graylog — Out of the box functionality, but you may have to tune up individual components to suit your needs.
Fluentd — Entirely new log-management paradigm, interesting and we explored it a bit.

(I did not consider anything exotic or involves us paying (in future) anything more than what we pay for it in first year. So, some great tools like splunk, nagios, logpacker, logrythm were not considered)

Evaluation Process:

I wrote an Ansible script to create a replica environment and pull in the necessary configurations. And used previously written load-test job to simulate a typical work hour. This configuration was used for each of the frameworks/tools considered.

I started experimenting with Graylog, due to familiarity with the tool. Configured it the best way, I felt appropriate at that point in time.

Slight setback:

However, the collector I had used (Sidecar with Filebeat) had a major problem in sending files over 255KB and the interval was less than 5 secs. And the packets that are to be sent to the Elasticsearch never made it. And the pile-up caused a major issue for application stability.

One of the main use-case for us is to ingest XML/JSON data from multiple sources. (We run a polynomial regression across multiple sources, and use the nth derivatives to do further business operations). Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads.

When the daily logs you need to export is in upwards of 5GB for an app (JSON logs), add multiple APIs and some micro-services application logs, web-server, load-balancers, CI (Jenkins), database-query-log, bin-log, redis and … yes, you get the point?

(())Upon further investigation, The sidecar collector was actually not the culprit. Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads! (in our defence)

So, once the CPU hit 100% mark, sidecar started behaving very differently. But, ultimately fixed it with a patched version of sidecar and actually shifting to NXLog.

Experiment with the ELK is a different beast in itself, as provisioning and configuring took a lot more time than I was comfortable with. So, switched to AWS “Packaged Service” . We deployed the ES domain in AWS, fired up a couple of Kibana and Logstash instances and connected them (after what appeard to be forever), it was a charm. Was able to get all information required in Kibana. One down-side is that you need to plan the Elastic Search indices according to how your log sources will grow. For us, it was impractical.

Fluentd was an excellent platform for normalising your logs, but then it also depended on Kibana/ES for the ultimate analysis frontend.

So, finally we settled down to good old Graylog.

Advantages of Graylog

The tool perfectly fit into our workflow and evolving environment:

Graylog is a free & open-source software. — So we wont have pay now or in future.
Its trigger actions and notifications are a good compliment to Graylog monitoring, just a bit deeper!
With error stack traces received from Graylog, engineers understand the context of any issue in the source code. This saves time and efforts for debugging/troubleshooting and bug fixing.
The tool has a powerful search syntax, so it is easy to find exactly what you are looking for, even if you have terabytes of log data. The search queries could be saved. For really complex scenarios, you could write an ElasticSearch query and save it in the dashboard as a function.
Graylog offers an archiving functionality, so everything older than 30 days could be stored on slow storage and re-imported into Graylog when such a need appears (for example, when the dev team need to investigate a certain event from the past).
Java, Python & Ruby applications could be easily connected with Graylog as there is an out-of-box library for this.

#logmanagement #analytics #startup #hustle #opensource #graylog #elk

Organisers of Brazil Protest use Analytics to Measure Attendance

By Ramkumar Sundarakalatharan | March 15, 2016 | Comments 0 Comment

Organizers of yesterday’s massive demonstration in São Paulo against the Brazilian government have employed an analytics tool to get accurate attendance data.
Opposition group Movimento Brasil Livre (MBL) was offered the technology by Israeli startup StoreSmarts for free through its Brazilian distributor SmartLok in exchange for the marketing exposure linked to the anti-government demo.

The technology used in the protest is all readily available and is in use for atleast 3 years now. Its is a combination of portable router and an application that is usually employed by retailers to monitor, analyze and provide insights on shopper behavior by detecting WiFi signals from mobile devices in a designated area.
In order to estimate the amount of people in any given area, the system only takes smartphones into account while ignoring other WiFi signals from devices such as laptops or routers. The calculations are carried out in real-time, so the system can also provide insight on its web dashboard into the peak hours of the protests.
By calculating the device’s receiver signal strength indication (RSSI), the system can also tell how long the smartphone – and therefore its owner – spent in the area that is being mapped. However, the system does not track or store data on individual users.
Typically, protest organizers in Brazil or their comrades across the world have to rely on data provided by the local authorities and large media organisations to get accurate insights on attendance. These media organisations themselves rely on local bodies. Those numbers are often believed to be inaccurate for political reasons – the StoreSmarts system suggests that 1.4 million people attended yesterday’s demonstration, a number that matches what has been provided by the local police.
When asked why it is interesting to provide the technology free of charge, the startup founder says that his Brazilian partner has been piloting StoreSmarts’ analytics tool with some retailers in São Paulo – so getting the extra attention is helpful.
“We believe in taking data driven decisions, whether it’s politics or retail. The exposure we get by supporting such requests is very important for us and our partner, as we see Brazil as a very important market,” Eliyahu says.