Author: Ramkumar Sundarakalatharan

Building a Log-Management & Analytics Solution for Your StartUp

Building a Log-Management & Analytics Solution for Your StartUp

Background:

As described in an earlier post, I am working with an early stage startup. So, one of my responsibility is to architect, build and manage the cloud infrastructure for the company. Even though I have had designed/built and maintained the cloud infrastructure in my previous roles, this one was really challenging and interesting. Due in part to the fact, that the organisation is a high growth #traveltech startup and hence,

  1. The architecture landscape is still evolving,
  2. Performance criteria for the previous month look like the minimum acceptable criteria the next (in terms of itineraries automated, rating, mappings, etc.)
  3. The sheer volume of user-growth
  4. Addition of partner inventories which increases the capacity by an order of magnitude

And several others.  Somewhere down the lane, after the infrastructure, code-pipeline and CI is set-up, you reach a point where managing (read: trigger intervention,  analysis, storage, archival, retention) logs across several set of infrastructure clusters like development/testing, staging and production becomes a bit of an overkill.

Enter Log Management & Analytics

Having worked up from a simple tail/multitail to Graylog-aggregation of 18 server logs, including App-servers, Database servers, API-endpoints and everything in between. But, as my honoured colleague (former) Mr.Naveen Venkat  (CPO of Zarget) used to mention in my days with Zarget, There are no “Go-To” persons in a start-up. You “Go-Figure” yourself!
There is definitely no “One size fits all” solution and especially, in a Start-up environment, you are always running behind Features, Timelines or Customers (scope, timeline, or cost in conventional PMI model).
So, After some due research to account for the recent advances in Logstash and Beats. I narrowed down on the possible contenders that can power our little log management system. They are,

  1. ELK Stack
  2. Graylog
  3. Logstash 

(I did not consider anything exotic or involves us paying (in future) anything more than what we pay for it in first year. So, some great tools like splunk, nagios, logpacker, logrythm were not considered)

Evaluation Process:

I started experimenting with Graylog, due to familiarity with the tool. Configured it the best way, I felt it appropriate at that point in time. However, the collector I had used  (Sidecar ) had a major problem in sending files over 255KB and the interval was less than 5 secs.
One of the main use-case for us is to ingest the actual JSON data from multiple sources. (We run a polynomial regression  across multiple sources, and use the n th derivatives to do further business operations). When the daily logs you need to export is in upwards of  500MB for an app (JSON logs), add other application log(s), web-server, load-balancers, CI (Jenkins), database, redis and … yes,  you get the point?
(())Upon further investigation, The sidecar collector was actually not the culprit. Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”.  
So, once the CPU hit 100% mark, sidecar started behaving very differently. But, ultimately fixed it with a patched version of sidecar.
 
 

MySQL on AWS: RDS vs EC2

MySQL on AWS: RDS vs EC2

We have recently gone through a very familiar dilemma, moving the MySQL database server of a nascent solution from a hybrid On-Premise and Azure environment to the AWS Infrastructure. Whether to go with a couple of EC2 instances and have a master-slave configuration or to go with AWS’ packaged offering for the RDBMS, viz., RDS. 
This is, in fact not a new question at all.  It would have come to any person who was getting plugged into AWS/Cloud infrastructure. Personally, for me, it was like 5th or 6th Datacenter setup in AWS. Still, the reason I had this question is, in part due to the fact that the solution I was trying to port was very nascent and was not having super high write-rate requirements.
So, in the absence of high-throughput requirements, did it really made sense to go to a full-blown RDS setup? Or is it worth the time to install, configure a 2 node cluster and write some automation scripts for backup, recovery, etc?
Question: What does RDS really offer that you cannot get with a self-configured MYSQL on RDS
Answer: It really boils down to the situation and the preference you have for control, flexibility and performance/high-availability.
 
 
 

Bitnami