Tag: opensource

The Fork in the Road: The Curveball that Redis Pitched

The Fork in the Road: The Curveball that Redis Pitched

In a move announced on March 20th, 2024, Redis, the ubiquitous in-memory data store, sent shockwaves through the tech world with a significant shift in its licensing model. Previously boasting a permissive BSD license, Redis transitioned to a dual-license approach, combining the Redis Source Available License (RSAL) and the Server Side Public License (SSPL). This move, while strategic for Redis Labs, has created ripples of concern in the SAAS ecosystem and the open-source community at large.

The Split: From Open to Source-Available

At its core, the change restricts how users, particularly cloud providers offering managed Redis services, can leverage the software commercially. The SSPL, outlined in the March 24th press release, stipulates that any derivative work offering the “same functionality as Redis” as a service must also be open-sourced. This directly impacts companies like Amazon (ElastiCache) and DigitalOcean, forcing them to potentially alter their service models or acquire commercial licenses from Redis Labs.

A History of Licensing Shifts

This isn’t the first time Redis Labs has ruffled feathers with licensing changes. As a 2019 TechCrunch article [1] highlights, Redis Labs has a history of tweaking its open-source license, sparking similar controversies. Back then, the company argued that cloud providers were profiting from Redis without giving back to the open-source community. The new SSPL appears to be an extension of this philosophy, aiming to compel greater contribution from commercial users.

SAAS Providers in a Squeeze

For SAAS providers, the new licensing throws a wrench into established business models. Modifying core functionality to comply with the SSPL might not be feasible, and open-sourcing their entire platform could expose proprietary code. This could lead to increased costs for SAAS companies, potentially impacting end-user pricing.

Open Source Community Divided

The open-source world is also grappling with the implications. While the core Redis functionality remains open-source under RSAL, the philosophical shift towards a more restrictive model has some worried. The Linux Foundation even announced a fork, Valkey, as an alternative, backed by tech giants like Google and Oracle. This fragmentation could create confusion and slow down innovation within the open-source Redis ecosystem.

The Road Ahead: Uncertainty and Innovation

The long-term effects of Redis’s licensing change remain to be seen. It might pave the way for a new model for open-source software sustainability, where companies can balance community development with commercial viability. However, it also raises concerns about control and potential fragmentation within open-source projects.

In conclusion, Redis’s licensing shift presents a complex scenario. While it aims to secure Redis Labs’ financial future, it disrupts the SAAS landscape and creates uncertainty in the open-source world. Only time will tell if this is a necessary evolution or a roadblock to future innovation.

References & Further Reading:

Building a Log-Management & Analytics Solution for Your StartUp

Building a Log-Management & Analytics Solution for Your StartUp

Building a Log-Management & Analytics Solution for Your StartUp

Background:

As described in an earlier post, I run the Engineering at an early stage #traveltech #startup called Itilite. So, one of my responsibility is to architect, build and manage the cloud infrastructure for the company. Even though I have had designed/built and maintained the cloud infrastructure in my previous roles, this one was really challenging and interesting. Due in part to the fact, that the organisation is a high growth #traveltech startup and hence,

  1. The architecture landscape is still evolving,
  2. Performance criteria for the previous month look like the minimum acceptable criteria the next
  3. The sheer volume of user-growth, growth of traffic-per-user
  4. Addition of partner inventories which increases the capacity by an order of magnitude

And several others. Somewhere down the lane, after the infrastructure, code-pipeline and CI is set-up, you reach a point where managing (read: trigger intervention, analysis, storage, archival, retention) logs across several set of infrastructure clusters like development/testing, staging and production becomes a bit of an overkill.

Enter Log Management & Analytics

Having worked up from a simple tail/multitail to Graylog-aggregation of 18 server logs, including App-servers, Database servers, API-endpoints and everything in between. But, as my honoured colleague (former) Mr.Naveen Venkat (CPO of Zarget) used to mention in my days with Zarget, There are no “Go-To” persons in a start-up. You “Go-Figure” yourself!

There is definitely no “One size fits all” solution and especially, in a Start-up environment, you are always running behind Features, Timelines or Customers (scope, timeline, or cost in conventional PMI model).

So, After some due research to account for the recent advances in Logstash and Beats. I narrowed down on the possible contenders that can power our little log management system. They are,

  1. ELK Stack  — Build it from scratch, but have flexibility.
  2. Graylog  — Out of the box functionality, but you may have to tune up individual components to suit your needs.
  3. Fluentd — Entirely new log-management paradigm, interesting and we explored it a bit.

(I did not consider anything exotic or involves us paying (in future) anything more than what we pay for it in first year. So, some great tools like splunk, nagios, logpacker, logrythm were not considered)

Evaluation Process:

I wrote an Ansible script to create a replica environment and pull in the necessary configurations. And used previously written load-test job to simulate a typical work hour. This configuration was used for each of the frameworks/tools considered.

I started experimenting with Graylog, due to familiarity with the tool. Configured it the best way, I felt appropriate at that point in time.

Slight setback:

However, the collector I had used (Sidecar with Filebeat) had a major problem in sending files over 255KB and the interval was less than 5 secs. And the packets that are to be sent to the Elasticsearch never made it. And the pile-up caused a major issue for application stability.

One of the main use-case for us is to ingest XML/JSON data from multiple sources. (We run a polynomial regression across multiple sources, and use the nth derivatives to do further business operations). Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads.

When the daily logs you need to export is in upwards of 5GB for an app (JSON logs), add multiple APIs and some micro-services application logs, web-server, load-balancers, CI (Jenkins), database-query-log, bin-log, redis and … yes, you get the point?

(())Upon further investigation, The sidecar collector was actually not the culprit. Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads! (in our defence) 

So, once the CPU hit 100% mark, sidecar started behaving very differently. But, ultimately fixed it with a patched version of sidecar and actually shifting to NXLog.

Experiment with the ELK is a different beast in itself, as provisioning and configuring took a lot more time than I was comfortable with. So, switched to AWS “Packaged Service” . We deployed the ES domain in AWS, fired up a couple of Kibana and Logstash instances and connected them (after what appeard to be forever), it was a charm. Was able to get all information required in Kibana. One down-side is that you need to plan the Elastic Search indices according to how your log sources will grow. For us, it was impractical.

Fluentd was an excellent platform for normalising your logs, but then it also depended on Kibana/ES for the ultimate analysis frontend.

So, finally we settled down to good old Graylog.

Advantages of Graylog

 The tool perfectly fit into our workflow and evolving environment:

  1. Graylog is a free & open-source software. — So we wont have pay now or in future.
  2. Its trigger actions and notifications are a good compliment to Graylog monitoring, just a bit deeper!
  3. With error stack traces received from Graylog, engineers understand the context of any issue in the source code. This saves time and efforts for debugging/troubleshooting and bug fixing.
  4. The tool has a powerful search syntax, so it is easy to find exactly what you are looking for, even if you have terabytes of log data. The search queries could be saved. For really complex scenarios, you could write an ElasticSearch query and save it in the dashboard as a function.
  5. Graylog offers an archiving functionality, so everything older than 30 days could be stored on slow storage and re-imported into Graylog when such a need appears (for example, when the dev team need to investigate a certain event from the past).
  6. Java, Python & Ruby applications could be easily connected with Graylog as there is an out-of-box library for this.

#logmanagement #analytics #startup #hustle #opensource #graylog #elk

Open Source Vs Open Governance: The state and Future of Open Source Movement

Open Source Vs Open Governance: The state and Future of Open Source Movement

Last week, DataStax announced that it was jettisoning its role in maintaining the Planet Cassandra community resource site, even as the project lead, Jonathan Ellis, made it known that DataStax would be doubling down on its commercial product, rather than Cassandra. Though the DataStax team put a brave face on the changes, the real question is why DataStax had to change at all.
Similarly, When Sun and then Oracle bought MySQL AB, the company behind the original development, MySQL open source database development governance gradually closed. Now, only Oracle writes updates, patches and features. Updates from other sources — individuals or other companies — are ignored.
These are two opposite extremes in the open-source movement. When an open source project reaches a critical threshold and following, it grows bigger than the chief contributor. A company or a group of people, it may be. But, it has taken shape in such a way that they can no longer commit resources and still the community will take care of everything, features, development, support, documentation and everything in between. This is the real essence of open source development.
However, it is absolutely necessary for an initial sponsor for the project to thrive in its infancy.
MySQL is still open source, but it has a closed governance. 
In the case of MySQL, the source code was forked by the community, and the MariaDB project started from there. Nowadays, when somebody says s/he is “using MySQL”, he is in fact probably using MariaDB, which has evolved from where MySQL stopped in time.
Take a look at the Github page of MySQL for reference.

MySQL Repository
        MySQL Repository with its core Contributors:  A project which powers 1 in 3 Web sites and apps, having 51 developers!!! 

 
MySQL Developers Commit Summary
All Core committers are from Oracle !
 
Cassandra is still open source, but now it has open governance.
The Cassandra question is ultimately about control. As the ASF board noted in the minutes from its meeting with DataStax representatives, “The Board expressed continuing concern that the PMC was not acting independently and that one company had undue influence over the project.” Given that DataStax has been Cassandra’s primary development engine since the day it spun out of Facebook, this “undue influence” is hardly new.
And, according to some closest to Cassandra, like former Cassandra MVP Kelly Sommers, that “undue influence” has borne exceptional fruits. Sommers clearly feels this way, insisting that the ASF “is really out of line in their actions with Cassandra,” ultimately concluding that the ASF might be hostile to the very people most responsible for a project’s success.
In her view, the ASF’s search for diversity in the Cassandra project should have started with expanding its existing leadership, rather than cutting it out: “The ASF forced DataStax to reduce their role in Cassandra rather than forming a long-term strategy to grow diversity around theirs,” Sommers said.
Though Sommers doesn’t directly comment on the trademark issues, she didn’t pull any punches in her disdain for project process over code results: “Politics is off the rails when focus is lost on success of the thing it runs and all that matters is process. This is how I feel ASF operates,”
So, for companies hoping to monetise open source, the Cassandra blow-up is a not-so-subtle reminder that community can be inimical to commercial interests, however much it can fuel adoption. It may also be a signal to the ASF that less corporate influence on projects could yield less code.
Now, Back to our agenda.

Open source vs. open governance

Open source software’s momentum serves as a powerful insurance policy for the investment of time and resources an individual or enterprise user will put into it. This is the true benefit behind Linux as an operating system, Samba as a file server, Apache HTTPD as a web server, Hadoop, Docker, MongoDB, PHP, Python, JQuery, Bootstrap and other hyper-essential open source projects, each on its own level of the stack. Open source momentum is the safe antidote to technology lock-in. Having learned that lesson over the last decade, enterprises are now looking for the new functionalities that are gaining momentum: cloud management software, big data, analytics, integration middleware and application frameworks.
On the open domain, the only two non-functional things that matter in the long term are whether it is open source and if it has attained momentum in the community and industry. None of this is related to how the software is being written, but this is exactly what open governance is concerned with: the how.

The value of momentum

Open governance alone does not guarantee that the software will be good, popular or useful (though formal open governance only happens on projects that have already captured some attention of IT industry leaders). A few examples of open source projects that have formal open governance are CloudFoundry, OpenStack, JQuery and all the projects under the Apache Software Foundation umbrella.
For users, the indirect benefit of open governance is only related to the speed the open source project reaches momentum and high popularity.
In conclusion it is a very delicate act of balancing Open-Source development and Open-Governance on Development. Oracle failed in getting the diversity thereby creating MariaDB while ASF ejected Datastax to avoid a repeat of the former!!!
 

GitHub Opensources its Load Balancer

GitHub Opensources its Load Balancer

GitHub will release as open source the GitHub Load Balancer (GLB), its internally developed load balancer.
GLB was originally built to accommodate GitHub’s need to serve billions of HTTP, Git, and SSH connections daily. Now the company will release components of GLB via open source, and it will share design details. This is seen as a major step in building scalable infrastructure using commodity hardware. for more details please refer to the GitHub Engineering Post .

GE & Bosch to leverage open source to deliver IoT tools

GE & Bosch to leverage open source to deliver IoT tools

Partnerships that could shape the internet of things for years are being forged just as enterprises fit IoT into their long-term plans.

Representation of an IoT & IIoT Convergence
Representation of an IoT & IIoT Convergence

As a vast majority of organisations have included #IoT as part of their strategic plans for the next two to three years. No single vendor can meet the diverse #IoT needs of all customers, so they’re joining forces and also trying to foster broader ecosystems. General Electric and Bosch did both recently announced their intention to do the same.
The two companies, both big players in #IIoT, said they will establish a core IoT software stack based on open-source software. They plan to integrate parts of GE’s #Predix operating system with the #Bosch IoT Suite in ways that will make complementary software services from each available on the other.
The work will take place in several existing open-source projects under the #Eclipse Foundation. These projects are creating code for things like messaging, user authentication, access control and device descriptions. Through the Eclipse projects, other vendors also will be able to create software services that are compatible with Predix and Bosch IoT Suite, said Greg Petroff, executive director of platform evangelism at GE Software.

If enterprises can draw on a broader set of software components that work together, they may look into doing things with IoT that they would not have considered otherwise, he said. These could include linking IoT data to ERP or changing their business model from one-time sales to subscriptions.
GE and Bosch will keep the core parts of Predix and IoT Suite unique and closed, Petroff said. In the case of Predix, for example, that includes security components. The open-source IoT stack will handle fundamental functions like messaging and how to connect to IoT data.
Partnerships and open-source software both are playing important roles in how IoT takes shape amid expectations of rapid growth in demand that vendors want to be able to serve. Recently, IBM joined with Cisco Systems to make elements of its Watson analytics available on Cisco IoT edge computing devices. Many of the common tools and specifications designed to make different IoT devices work together are being developed in an open-source context.

GitLab hits back on GitHub with a visual issue tracking board!

GitLab hits back on GitHub with a visual issue tracking board!

The startup today introduced an issue tracking feature for its fast-growing code hosting platform that promises to help development teams organize the features, enhancements and other items on their to-do lists more effectively.

GitLab

The GitLab Issue Board is a sleek graphical panel that provides the ability to display tasks as digital note cards and sort them into neat columns each representing a different part of the application lifecycle. By default, new panels start only with a “Backlog” section for items still in the queue and a “Done” list that shows completed tasks, but users are able to easily add more tabs if necessary. GitLab says that the tool makes it possible to break up a view into as many as 10 different segments if need be, which should be enough for even the most complex software projects.

An enterprise development team working on an internal client-server service, for instance, could create separate sections to hold backend tasks, issues related to the workload’s desktop client and user experience bugs. Users with such crowded boards can also take advantage of GitLab’s built in tagging mechanism to label each item with color-coded tags denoting its purpose. The feature not only helps quickly make sense of the cards on a given board but also makes it easier to find specific items in the process. When an engineer wants to check if there are new bug fix requests concerning their part of a project, they can simply filter the board view based on the appropriate tags.

Read More here..


 

Google makes its TensorFlow artificial intelligence platform available on iOS

Google makes its TensorFlow artificial intelligence platform available on iOS

Logo of Tensor Flow
Google this week has published a new version of its TensorFlow machine learning software that adds support for iOS. Google initially teased that it was working on iOS support for TensorFlow last November, but said it was unable to give a timeline. An early version of TensorFlow version 0.9 was released yesterday on GitHub, however, and it brings iOS support.
For those unfamiliar, TensorFlow is Google’s incredibly powerful artificial intelligence software that powers many of Google’s services and initiatives, including AlphaGo. Google describes TensorFlow as “neural network” software that processes data in a way that’s similar how our brain cells process data (via CNET).
With Google adding iOS support to TensorFlow, apps will be able to integrate the smarter neural network capabilities into their apps, ultimately making them considerably smarter and capable.
At this point, it’s unclear when the final version of TensorFlow 0.9 will be released, but the early pre-release version is available now on GitHub. In the release notes, Google points out that because TensorFlow is now open source, 46 people from outside the company contributed to TensorFlow version 0.9.
In addition to adding support for iOS, TensorFlow 0.9 adds a handful of other new features and improvements, as well as plenty of smaller bug fixes and performance enhancements. You can read the full change log below and access TensorFlow on GitHub.
 
Major Features and Improvements

  • Python 3.5 support and binaries
  • Added iOS support
  • Added support for processing on GPUs on MacOS
  • Added makefile for better cross-platform build support (C API only)
  • fp16 support for many ops
  • Higher level functionality in contrib.{layers,losses,metrics,learn}
  • More features to Tensorboard
  • Improved support for string embedding and sparse features
  • TensorBoard now has an Audio Dashboard, with associated audio summaries.

Big Fixes and Other Changes

  • Turned on CuDNN Autotune.
  • Added support for using third-party Python optimization algorithms (contrib.opt).
  • Google Cloud Storage filesystem support.
  • HDF5 support
  • Add support for 3d convolutions and pooling.
  • Update gRPC release to 0.14.
  • Eigen version upgrade.
  • Switch to eigen thread pool
  • tf.nn.moments() now accepts a shift argument. Shifting by a good estimate of the mean improves numerical stability. Also changes the behavior of the shift argument to tf.nn.sufficient_statistics().
  • Performance improvements
  • Many bugfixes
  • Many documentation fixes
  • TensorBoard fixes: graphs with only one data point, Nan values, reload button and auto-reload, tooltips in scalar charts, run filtering, stable colors
  • Tensorboard graph visualizer now supports run metadata. Clicking on nodes while viewing a stats for a particular run will show runtime statistics, such as memory or compute usage. Unused nodes will be faded out.
Bitnami