opensource - Nocturnalknight's Lair

Hidden Threats in PyPI and NPM: What You Need to Know

By Ramkumar Sundarakalatharan | November 9, 2024 | Comments 0 Comment

Introduction: Dependency Dangers in the Developer Ecosystem

Modern software development is fuelled by open-source packages, ranging from Python (PyPI) and JavaScript (npm) to PHP (phar) and pip modules. These packages have revolutionised development cycles by providing reusable components, thereby accelerating productivity and creating a rich ecosystem for innovation. However, this very reliance comes with a significant security risk: these widely used packages have become an attractive target for cybercriminals. As developers seek to expedite the development process, they may overlook the necessary due diligence on third-party packages, opening the door to potential security breaches.

Faster Development, Shorter Diligence: A Security Conundrum

Today, shorter development cycles and agile methodologies demand speed and flexibility. Continuous Integration/Continuous Deployment (CI/CD) pipelines encourage rapid iterations and frequent releases, leaving little time for the verification of every dependency. The result? Developers often choose dependencies without conducting rigorous checks on package integrity or legitimacy. This environment creates an opening for attackers to distribute malicious packages by leveraging popular repositories such as PyPI, npm, and others, making them vectors for harmful payloads and information theft.

Malicious Package Techniques: A Deeper Dive

While typosquatting is a common technique used by attackers, there are several other methods employed to distribute malicious packages:

Supply Chain Attacks: Attackers compromise legitimate packages by gaining access to the repository or the maintainer’s account. Once access is obtained, they inject malicious code into trusted packages, which then get distributed to unsuspecting users.
Dependency Confusion: This technique involves uploading packages with names identical to internal, private dependencies used by companies. When developers inadvertently pull from the public repository instead of their internal one, they introduce malicious code into their projects. This method exploits the default behaviour of package managers prioritising public over private packages.
Malicious Code Injection: Attackers often inject harmful scripts directly into a package’s source code. This can be done by compromising a developer’s environment or using compromised libraries as dependencies, allowing attackers to spread the malicious payload to all users of that package.

These methods are increasingly sophisticated, leveraging the natural behaviours of developers and package management systems to spread malicious code, steal sensitive information, or compromise entire systems.

Timeline of Incidents: Malicious Packages in the Spotlight

A series of high-profile incidents have demonstrated the vulnerabilities inherent in unchecked package installations:

June 2022: Malicious Python packages such as loglib-modules, pyg-modules, pygrata, pygrata-utils, and hkg-sol-utils were caught exfiltrating AWS credentials and sensitive developer information to unsecured endpoints. These packages were disguised to look like legitimate tools and fooled many unsuspecting developers. (BleepingComputer)
December 2022: A malicious package masquerading as a SentinelOne SDK was uploaded to PyPI, with malware designed to exfiltrate sensitive data from infected systems. (The Register)
January 2023: The popular ctx package was compromised to steal environment variables, including AWS keys, and send them to a remote server. This instance affected many developers and highlighted the scale of potential data leakage through dependencies. (BleepingComputer)
September 2023: An extended campaign involving malicious npm and PyPI packages targeted developers to steal SSH keys, AWS credentials, and other sensitive information, affecting numerous projects globally. (BleepingComputer)
October 2023: The recent incident involving the fabrice package is a stark reminder of how easy it is for attackers to deceive developers. The fabrice package, designed to mimic the legitimate fabric library, employed a typosquatting strategy, exploiting typographical errors to infiltrate systems. Since its release, the package was downloaded over 37,000 times and covertly collected AWS credentials using the boto3 library, transmitting the stolen data to a remote server via VPN, thereby obscuring the true origin of the attack. The package contained different payloads for Linux and Windows systems, utilising scheduled tasks and hidden directories to establish persistence. (Developer-Tech)

The Impact: Scope of Compromise

The estimated number of affected companies and products is difficult to pin down precisely due to the widespread usage of open-source packages in both small-scale and enterprise-level applications. Given that some of these malicious packages garnered tens of thousands of downloads, the potential damage stretches across countless software projects. With popular packages like ctx and others reaching a substantial audience, the economic and reputational impact could be significant, potentially costing affected businesses millions in breach recovery and remediation costs.

Real-world Impact: Consequences of Malicious Packages

The real-world impact of malicious packages is profound, with consequences ranging from data breaches to financial loss and severe reputational damage. The following are some of the key impacts:

British Airways and Ticketmaster Data Breach: In 2018, the Magecart group exploited vulnerabilities in third-party scripts used by British Airways and Ticketmaster. The attackers injected malicious code to skim payment details of customers, leading to significant data breaches and financial loss. British Airways was fined £20 million for the breach, which affected over 400,000 customers. (BBC)
Codecov Bash Uploader Incident: In April 2021, Codecov, a popular code coverage tool, was compromised. Attackers modified the Bash Uploader script, which is used to send coverage reports, to collect sensitive information from Codecov’s users, including credentials, tokens, and keys. This supply chain attack impacted hundreds of customers, including notable companies like HashiCorp. (GitGuardian)
Event-Stream NPM Package Attack: In 2018, a popular JavaScript library event-stream was hijacked by a malicious actor who added code to steal cryptocurrency from applications using the library. The compromised version was downloaded millions of times before the attack was detected, affecting numerous developers and projects globally. (Synk)

These incidents highlight the potential repercussions of malicious packages, including severe financial penalties, reputational damage, and the theft of sensitive customer information.

Fabrice: A Case Study in Typosquatting

The recent incident involving the fabrice package is a stark reminder of how easy it is for attackers to deceive developers. The fabrice package, designed to mimic the legitimate fabric library, employed a typosquatting strategy, exploiting typographical errors to infiltrate systems. Since its release, the package was downloaded over 37,000 times and covertly collected AWS credentials using the boto3 library, transmitting the stolen data to a remote server via VPN, thereby obscuring the true origin of the attack. The package contained different payloads for Linux and Windows systems, utilising scheduled tasks and hidden directories to establish persistence. (Developer-Tech)

Lessons Learned: Importance of Proactive Security Measures

The cases highlighted in this article offer important lessons for developers and organisations:

Dependency Verification is Crucial: Typosquatting and dependency confusion can be avoided by carefully verifying package authenticity. Implementing strict naming conventions and utilising internal package repositories can help prevent these attacks.
Security Throughout the SDLC: Integrating security checks into every phase of the SDLC, including automated code reviews and security testing of modules, is essential. This ensures that vulnerabilities are identified early and mitigated before reaching production.
Use of Vulnerability Scanning Tools: Tools like Snyk and OWASP Dependency-Check are invaluable in proactively identifying vulnerabilities. Organisations should make these tools a mandatory part of the development process to mitigate risks from third-party dependencies.
Security Training and Awareness: Developers must be educated about the risks associated with third-party packages and taught how to identify potentially malicious code. Regular training can significantly reduce the likelihood of falling victim to these attacks.

By recognising these lessons, developers and organisations can better safeguard their software supply chains and mitigate the risks associated with third-party dependencies.

Prevention Strategies: Staying Safe from Malicious Packages

To mitigate the risks associated with malicious packages, developers and startups must adopt a multi-layered defence approach:

Verify Package Authenticity: Always verify package names, descriptions, and maintainers. Opt for well-reviewed and frequently updated packages over relatively unknown ones.
Review Source Code: Whenever possible, review the source code of the package, especially for dependencies with recent uploads or unknown maintainers.
Use Package Scanners: Employ tools like Sonatype Nexus, npm audit, or PyUp to identify vulnerabilities and malicious code within packages.
Leverage Lockfiles: Tools like package-lock.json (npm) or Pipfile.lock (pip) can help prevent unintended updates by locking dependencies to a specific version.
Implement Least Privilege: Limit the permissions assigned to development environments to reduce the impact of compromised keys or credentials.
Regular Audits: Conduct regular security audits of dependencies as part of the CI/CD pipeline to minimise risk.

Software Security: Embedding Security in the Development Lifecycle

To mitigate the risks associated with malicious packages and other vulnerabilities, it is essential to integrate security into every phase of the Software Development Lifecycle (SDLC). This practice, known as the Secure Software Development Lifecycle (SSDLC), emphasises incorporating security best practices throughout the development process.

Key Components of SSDLC

Automated Code Reviews: Leveraging tools that automatically scan code for vulnerabilities and flag potential issues early in the development cycle can significantly reduce the risk of security flaws making it into production. Tools like SonarQube, Checkmarx, and Veracode help in ensuring that security is built into the code from the beginning.
Security Testing of Modules: Security testing should be conducted on third-party modules before integrating them into the project. Tools like Snyk and OWASP Dependency-Check can identify vulnerabilities in dependencies and provide remediation advice.

Deep Dive into Technical Details

Malicious Package Techniques: As discussed earlier, typosquatting is just one of the many attack techniques. Supply chain attacks, dependency confusion, and malicious code injection are also common methods attackers use to compromise software projects. It is essential to understand these techniques and incorporate checks that can prevent such attacks during the development process.
Vulnerability Analysis Tools:
- Snyk: Snyk helps developers identify vulnerabilities in open-source libraries and container images. It scans the project dependencies and cross-references them with a constantly updated vulnerability database. Once vulnerabilities are identified, Snyk provides detailed remediation advice, including fixing the version or applying patches.
- OWASP Dependency-Check: OWASP Dependency-Check is an open-source tool that scans project dependencies for known vulnerabilities. It works by identifying the libraries used in the project, then checking them against the National Vulnerability Database (NVD) to highlight potential risks. The tool also provides reports and actionable insights to help developers remediate the issues.
- Sonatype Nexus: Sonatype Nexus offers a repository management system that integrates directly with CI/CD pipelines to scan for vulnerabilities. It uses machine learning and other advanced techniques to continuously monitor and evaluate open-source libraries, providing alerts and remediation options.

Best Practices for Secure Dependency Management

Dependency Pinning: Pinning dependencies to specific versions helps in preventing unexpected updates that may contain vulnerabilities. By using tools like package-lock.json (npm) or Pipfile.lock (pip), developers can ensure that they are not inadvertently upgrading to a compromised version of a dependency.
Use of Private Registries: Hosting private package registries allows organisations to maintain tighter control over the dependencies used in their projects. By using tools like Nexus Repository or Artifactory, companies can create a trusted repository of dependencies and mitigate risks associated with public registries.
Robust Security Policies: Organisations should implement strict policies around the use of open-source components. This includes performing security audits, using automated tools to scan for vulnerabilities, and enforcing review processes for any new dependencies being added to the codebase.

By integrating these practices into the development process, organisations can build more resilient software, reduce vulnerabilities, and prevent incidents involving malicious dependencies.

Conclusion

As the developer community continues to embrace rapid innovation, understanding the security risks inherent in third-party dependencies is crucial. Adopting preventive measures and enforcing better dependency management practices are vital to mitigate the risks of malicious packages compromising projects, data, and systems. By recognising these threats, developers and startups can secure their software supply chains and build more resilient products.

References & Further Reading

Why Did Elastic Decide to Go Open Source Again?

By Ramkumar Sundarakalatharan | October 1, 2024 | Comments 0 Comment

Elastic’s Return to Open Source: The Knight is back to the Pavilion

Elastic, the company behind Elasticsearch, recently decided to revert to an open-source licensing model after four years of operating under a proprietary license. This decision reflects a shift in strategy that emphasizes community-driven innovation and collaboration. In 2019, Elastic initially adopted a proprietary model to protect its intellectual property from cloud providers like Amazon Web Services (AWS), which were benefiting from Elasticsearch without contributing to its development. However, the move away from open-source posed its own challenges, including alienating the developer community that had helped build Elasticsearch into a widely-used tool.

In 2024, Elastic CEO Shay Banon announced the company’s return to an open-source framework. He explained that this decision stems from the belief that open collaboration fosters innovation and better serves the long-term interests of both the company and its user base. “We believe that the best products are built together,” Banon stated, emphasizing the value of community engagement in product development.

Recent Changes in Open-Source Licensing Models

Elastic’s decision is not an isolated incident. Over the past few years, several other technology companies have reconsidered their licensing models in response to the changing dynamics of software development and cloud service providers. These companies have struggled with how to balance open-source principles with the need to protect their commercial interests.

Redis Labs
Redis Labs initially licensed Redis under a permissive open-source license, but in 2018, the company adopted the Commons Clause to prevent cloud providers from offering Redis as a service without contributing to its development. However, after facing backlash from the developer community, Redis Labs adjusted its approach by introducing Redis Stack under more community-friendly terms, highlighting the difficulty of maintaining open-source integrity while ensuring business protection.
HashiCorp
In 2023, HashiCorp, known for popular tools like Terraform, adopted a Business Source License (BSL), which restricts the usage of its software in certain commercial contexts. HashiCorp’s move was driven by concerns over cloud providers monetizing its tools without contributing back to the open-source community. While BSL is not a traditional open-source license, HashiCorp continues to maintain a balance between openness and protecting its intellectual property, showing how companies are navigating complex market dynamics.
MongoDB
MongoDB’s shift to the Server Side Public License (SSPL) in 2018 was another major development in the open-source licensing debate. The SSPL aims to prevent cloud service providers from exploiting MongoDB’s open-source code without contributing back. While the SSPL is more restrictive than traditional open-source licenses, MongoDB’s goal was to retain the open-source ethos while ensuring that cloud vendors could not commercialize the software without contributing to its development.
Chef Software
Chef, an automation tool provider, switched all of its products to open-source in 2019 after years of operating under a mixed licensing model. This shift was largely a response to the growing demand for transparency and community collaboration. Chef’s decision allowed it to rebuild trust within its user base and align its business strategy with the broader trends in software development.

Impact on the Average Software Developer

For the average software developer, these licensing model changes can profoundly impact their work, career growth, and day-to-day development practices.

Access to Cutting-Edge Tools
When companies like Elastic and MongoDB return to open-source models, developers gain unrestricted access to powerful tools and frameworks. This democratizes the technology, allowing developers from small companies, startups, and even personal projects to leverage the same tools that major enterprises use, without the barrier of expensive proprietary licenses. For many developers, open-source provides not just tools, but an entire ecosystem for experimentation, learning, and rapid prototyping.
Contributing to Open-Source Communities
Open-source contributions are an essential career-building tool for many developers. By contributing to open-source projects, developers can gain real-world experience, build portfolios, and even influence the direction of widely-used technologies. When companies like HashiCorp and Redis Labs shift their focus back to open-source, it increases opportunities for developers to become part of a larger, global development community.
Career and Learning Opportunities
Exposure to open-source projects allows developers to work with cutting-edge technology and methodologies. This can accelerate learning, as open-source projects are often evolving quickly with input from diverse and global teams. Additionally, contributing to popular open-source projects like Elastic or Kubernetes can greatly enhance a developer’s resume and open doors to career opportunities, including job offers and consulting roles.
Navigating Licensing Restrictions
Developers must also become more adept at navigating the complexities of new licenses like SSPL and BSL. These licenses place restrictions on how open-source software can be used, especially in cloud environments. Understanding the fine print is crucial for developers working in enterprise environments or launching their own SaaS products, as improper use of open-source software can lead to legal complications. This makes legal and compliance knowledge increasingly important in modern software development roles.

Open Source vs. Open Governance: A Crucial Distinction

Elastic’s journey highlights a key debate in the software development world: the difference between open source and open governance. While many companies have embraced open-source models, few have transitioned to open governance frameworks, which involve community-driven decision-making for the project’s future direction.

As highlighted in my previous article, “Open Source vs. Open Governance: The State and Future of the Movement,” the distinction lies in control. In open-source projects, the code is freely available, but decisions regarding the project’s roadmap and key developments may still be controlled by a single entity, such as a company. In contrast, open governance ensures that decision-making is decentralized, often involving multiple stakeholders, including developers, users, and companies that contribute to the project.

For Elastic and others, returning to open-source doesn’t necessarily mean embracing open governance. Although Elastic’s code will be open for contributions, the strategic direction will still be managed by the company. This is a common approach in many high-profile open-source projects. For example, Google’s Kubernetes operates under the open-source model but is governed by a diverse group of stakeholders, ensuring the project’s direction isn’t controlled by a single entity. On the other hand, projects like OpenStack follow a more open governance approach, with broader community involvement in decision-making.

Understanding the difference between open-source and open governance is critical as the software industry evolves. Companies are beginning to realize that open-source alone doesn’t always translate into the collaborative, community-driven development they seek. Open governance provides a framework for more inclusive decision-making, but it also presents challenges in terms of efficiency and control.

Looking Ahead: Open Source as a Business Strategy

The return of Elastic and other companies to more open models indicates a growing recognition of the importance of open-source in the software industry. For Elastic, this decision is about more than just licensing; it’s about reconnecting with a developer community that thrives on transparency and collaboration. By embracing open-source again, Elastic hopes to accelerate product development and foster stronger relationships with users.

This broader trend shows that while companies are still cautious about cloud providers exploiting their software, they are increasingly finding ways to leverage open-source models as a business strategy. These recent changes to licensing frameworks highlight the evolving nature of software development and the role open-source plays in it.

For organizations navigating the complex decision between proprietary and open-source models, the key lesson from Elastic’s experience is that the long-term benefits of community-driven development and innovation can outweigh the short-term protection of proprietary models. As more companies follow suit, it’s clear that open-source is not just a technical choice—it’s a business strategy.

References:

Elastic founder on why they returned to open-source four years after going proprietary – TechCrunch
Redis Stack: Moving back to a community-driven model
HashiCorp’s shift to Business Source License
MongoDB’s adoption of SSPL
Chef Software’s decision to return to open-source in 2019
Open Source vs. Open Governance: The State and Future of the Movement – LinkedIn.

The Fork in the Road: The Curveball that Redis Pitched

By Ramkumar Sundarakalatharan | April 4, 2024 | Comments 0 Comment

In a move announced on March 20th, 2024, Redis, the ubiquitous in-memory data store, sent shockwaves through the tech world with a significant shift in its licensing model. Previously boasting a permissive BSD license, Redis transitioned to a dual-license approach, combining the Redis Source Available License (RSAL) and the Server Side Public License (SSPL). This move, while strategic for Redis Labs, has created ripples of concern in the SAAS ecosystem and the open-source community at large.

The Split: From Open to Source-Available

At its core, the change restricts how users, particularly cloud providers offering managed Redis services, can leverage the software commercially. The SSPL, outlined in the March 24th press release, stipulates that any derivative work offering the “same functionality as Redis” as a service must also be open-sourced. This directly impacts companies like Amazon (ElastiCache) and DigitalOcean, forcing them to potentially alter their service models or acquire commercial licenses from Redis Labs.

A History of Licensing Shifts

This isn’t the first time Redis Labs has ruffled feathers with licensing changes. As a 2019 TechCrunch article [1] highlights, Redis Labs has a history of tweaking its open-source license, sparking similar controversies. Back then, the company argued that cloud providers were profiting from Redis without giving back to the open-source community. The new SSPL appears to be an extension of this philosophy, aiming to compel greater contribution from commercial users.

SAAS Providers in a Squeeze

For SAAS providers, the new licensing throws a wrench into established business models. Modifying core functionality to comply with the SSPL might not be feasible, and open-sourcing their entire platform could expose proprietary code. This could lead to increased costs for SAAS companies, potentially impacting end-user pricing.

Open Source Community Divided

The open-source world is also grappling with the implications. While the core Redis functionality remains open-source under RSAL, the philosophical shift towards a more restrictive model has some worried. The Linux Foundation even announced a fork, Valkey, as an alternative, backed by tech giants like Google and Oracle. This fragmentation could create confusion and slow down innovation within the open-source Redis ecosystem.

The Road Ahead: Uncertainty and Innovation

The long-term effects of Redis’s licensing change remain to be seen. It might pave the way for a new model for open-source software sustainability, where companies can balance community development with commercial viability. However, it also raises concerns about control and potential fragmentation within open-source projects.

In conclusion, Redis’s licensing shift presents a complex scenario. While it aims to secure Redis Labs’ financial future, it disrupts the SAAS landscape and creates uncertainty in the open-source world. Only time will tell if this is a necessary evolution or a roadblock to future innovation.

References & Further Reading:

Official Redis PR – https://redis.com/blog/redis-adopts-dual-source-available-licensing/
Microsoft Azure Explainer – https://azure.microsoft.com/en-us/blog/redis-license-update-what-you-need-to-know/
Redis Labs changes its open-source license — again! – TechCrunch – https://techcrunch.com/2019/02/21/redis-labs-changes-its-open-source-license-again/

Building a Log-Management & Analytics Solution for Your StartUp

By Ramkumar Sundarakalatharan | November 5, 2019 | Comments 0 Comment

Building a Log-Management & Analytics Solution for Your StartUp

Background:

As described in an earlier post, I run the Engineering at an early stage #traveltech #startup called Itilite. So, one of my responsibility is to architect, build and manage the cloud infrastructure for the company. Even though I have had designed/built and maintained the cloud infrastructure in my previous roles, this one was really challenging and interesting. Due in part to the fact, that the organisation is a high growth #traveltech startup and hence,

The architecture landscape is still evolving,
Performance criteria for the previous month look like the minimum acceptable criteria the next
The sheer volume of user-growth, growth of traffic-per-user
Addition of partner inventories which increases the capacity by an order of magnitude

And several others. Somewhere down the lane, after the infrastructure, code-pipeline and CI is set-up, you reach a point where managing (read: trigger intervention, analysis, storage, archival, retention) logs across several set of infrastructure clusters like development/testing, staging and production becomes a bit of an overkill.

Enter Log Management & Analytics

Having worked up from a simple tail/multitail to Graylog-aggregation of 18 server logs, including App-servers, Database servers, API-endpoints and everything in between. But, as my honoured colleague (former) Mr.Naveen Venkat (CPO of Zarget) used to mention in my days with Zarget, There are no “Go-To” persons in a start-up. You “Go-Figure” yourself!

There is definitely no “One size fits all” solution and especially, in a Start-up environment, you are always running behind Features, Timelines or Customers (scope, timeline, or cost in conventional PMI model).

So, After some due research to account for the recent advances in Logstash and Beats. I narrowed down on the possible contenders that can power our little log management system. They are,

ELK Stack — Build it from scratch, but have flexibility.
Graylog — Out of the box functionality, but you may have to tune up individual components to suit your needs.
Fluentd — Entirely new log-management paradigm, interesting and we explored it a bit.

(I did not consider anything exotic or involves us paying (in future) anything more than what we pay for it in first year. So, some great tools like splunk, nagios, logpacker, logrythm were not considered)

Evaluation Process:

I wrote an Ansible script to create a replica environment and pull in the necessary configurations. And used previously written load-test job to simulate a typical work hour. This configuration was used for each of the frameworks/tools considered.

I started experimenting with Graylog, due to familiarity with the tool. Configured it the best way, I felt appropriate at that point in time.

Slight setback:

However, the collector I had used (Sidecar with Filebeat) had a major problem in sending files over 255KB and the interval was less than 5 secs. And the packets that are to be sent to the Elasticsearch never made it. And the pile-up caused a major issue for application stability.

One of the main use-case for us is to ingest XML/JSON data from multiple sources. (We run a polynomial regression across multiple sources, and use the nth derivatives to do further business operations). Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads.

When the daily logs you need to export is in upwards of 5GB for an app (JSON logs), add multiple APIs and some micro-services application logs, web-server, load-balancers, CI (Jenkins), database-query-log, bin-log, redis and … yes, you get the point?

(())Upon further investigation, The sidecar collector was actually not the culprit. Our architecture had accounted for several things, but by design, we used to hit momentary peaks in CPU utilisation for the “Merges”. And all of these were “NICE” loads! (in our defence)

So, once the CPU hit 100% mark, sidecar started behaving very differently. But, ultimately fixed it with a patched version of sidecar and actually shifting to NXLog.

Experiment with the ELK is a different beast in itself, as provisioning and configuring took a lot more time than I was comfortable with. So, switched to AWS “Packaged Service” . We deployed the ES domain in AWS, fired up a couple of Kibana and Logstash instances and connected them (after what appeard to be forever), it was a charm. Was able to get all information required in Kibana. One down-side is that you need to plan the Elastic Search indices according to how your log sources will grow. For us, it was impractical.

Fluentd was an excellent platform for normalising your logs, but then it also depended on Kibana/ES for the ultimate analysis frontend.

So, finally we settled down to good old Graylog.

Advantages of Graylog

The tool perfectly fit into our workflow and evolving environment:

Graylog is a free & open-source software. — So we wont have pay now or in future.
Its trigger actions and notifications are a good compliment to Graylog monitoring, just a bit deeper!
With error stack traces received from Graylog, engineers understand the context of any issue in the source code. This saves time and efforts for debugging/troubleshooting and bug fixing.
The tool has a powerful search syntax, so it is easy to find exactly what you are looking for, even if you have terabytes of log data. The search queries could be saved. For really complex scenarios, you could write an ElasticSearch query and save it in the dashboard as a function.
Graylog offers an archiving functionality, so everything older than 30 days could be stored on slow storage and re-imported into Graylog when such a need appears (for example, when the dev team need to investigate a certain event from the past).
Java, Python & Ruby applications could be easily connected with Graylog as there is an out-of-box library for this.

#logmanagement #analytics #startup #hustle #opensource #graylog #elk

Open Source Vs Open Governance: The state and Future of Open Source Movement

By Ramkumar Sundarakalatharan | November 16, 2016 | Comments 0 Comment

Last week, DataStax announced that it was jettisoning its role in maintaining the Planet Cassandra community resource site, even as the project lead, Jonathan Ellis, made it known that DataStax would be doubling down on its commercial product, rather than Cassandra. Though the DataStax team put a brave face on the changes, the real question is why DataStax had to change at all.
Similarly, When Sun and then Oracle bought MySQL AB, the company behind the original development, MySQL open source database development governance gradually closed. Now, only Oracle writes updates, patches and features. Updates from other sources — individuals or other companies — are ignored.
These are two opposite extremes in the open-source movement. When an open source project reaches a critical threshold and following, it grows bigger than the chief contributor. A company or a group of people, it may be. But, it has taken shape in such a way that they can no longer commit resources and still the community will take care of everything, features, development, support, documentation and everything in between. This is the real essence of open source development.
However, it is absolutely necessary for an initial sponsor for the project to thrive in its infancy.
MySQL is still open source, but it has a closed governance.
In the case of MySQL, the source code was forked by the community, and the MariaDB project started from there. Nowadays, when somebody says s/he is “using MySQL”, he is in fact probably using MariaDB, which has evolved from where MySQL stopped in time.
Take a look at the Github page of MySQL for reference.

MySQL Repository with its core Contributors: A project which powers 1 in 3 Web sites and apps, having 51 developers!!!

All Core committers are from Oracle !

Cassandra is still open source, but now it has open governance.
The Cassandra question is ultimately about control. As the ASF board noted in the minutes from its meeting with DataStax representatives, “The Board expressed continuing concern that the PMC was not acting independently and that one company had undue influence over the project.” Given that DataStax has been Cassandra’s primary development engine since the day it spun out of Facebook, this “undue influence” is hardly new.
And, according to some closest to Cassandra, like former Cassandra MVP Kelly Sommers, that “undue influence” has borne exceptional fruits. Sommers clearly feels this way, insisting that the ASF “is really out of line in their actions with Cassandra,” ultimately concluding that the ASF might be hostile to the very people most responsible for a project’s success.
In her view, the ASF’s search for diversity in the Cassandra project should have started with expanding its existing leadership, rather than cutting it out: “The ASF forced DataStax to reduce their role in Cassandra rather than forming a long-term strategy to grow diversity around theirs,” Sommers said.
Though Sommers doesn’t directly comment on the trademark issues, she didn’t pull any punches in her disdain for project process over code results: “Politics is off the rails when focus is lost on success of the thing it runs and all that matters is process. This is how I feel ASF operates,”
So, for companies hoping to monetise open source, the Cassandra blow-up is a not-so-subtle reminder that community can be inimical to commercial interests, however much it can fuel adoption. It may also be a signal to the ASF that less corporate influence on projects could yield less code.
Now, Back to our agenda.

Open source vs. open governance

Open source software’s momentum serves as a powerful insurance policy for the investment of time and resources an individual or enterprise user will put into it. This is the true benefit behind Linux as an operating system, Samba as a file server, Apache HTTPD as a web server, Hadoop, Docker, MongoDB, PHP, Python, JQuery, Bootstrap and other hyper-essential open source projects, each on its own level of the stack. Open source momentum is the safe antidote to technology lock-in. Having learned that lesson over the last decade, enterprises are now looking for the new functionalities that are gaining momentum: cloud management software, big data, analytics, integration middleware and application frameworks.
On the open domain, the only two non-functional things that matter in the long term are whether it is open source and if it has attained momentum in the community and industry. None of this is related to how the software is being written, but this is exactly what open governance is concerned with: the how.

The value of momentum

Open governance alone does not guarantee that the software will be good, popular or useful (though formal open governance only happens on projects that have already captured some attention of IT industry leaders). A few examples of open source projects that have formal open governance are CloudFoundry, OpenStack, JQuery and all the projects under the Apache Software Foundation umbrella.
For users, the indirect benefit of open governance is only related to the speed the open source project reaches momentum and high popularity.
In conclusion it is a very delicate act of balancing Open-Source development and Open-Governance on Development. Oracle failed in getting the diversity thereby creating MariaDB while ASF ejected Datastax to avoid a repeat of the former!!!

GitHub Opensources its Load Balancer

By Ramkumar Sundarakalatharan | September 28, 2016 | Comments 0 Comment

GitHub will release as open source the GitHub Load Balancer (GLB), its internally developed load balancer.
GLB was originally built to accommodate GitHub’s need to serve billions of HTTP, Git, and SSH connections daily. Now the company will release components of GLB via open source, and it will share design details. This is seen as a major step in building scalable infrastructure using commodity hardware. for more details please refer to the GitHub Engineering Post .

GE & Bosch to leverage open source to deliver IoT tools

By Ramkumar Sundarakalatharan | September 28, 2016 | Comments 0 Comment

Partnerships that could shape the internet of things for years are being forged just as enterprises fit IoT into their long-term plans.

Representation of an IoT & IIoT Convergence

As a vast majority of organisations have included #IoT as part of their strategic plans for the next two to three years. No single vendor can meet the diverse #IoT needs of all customers, so they’re joining forces and also trying to foster broader ecosystems. General Electric and Bosch did both recently announced their intention to do the same.
The two companies, both big players in #IIoT, said they will establish a core IoT software stack based on open-source software. They plan to integrate parts of GE’s #Predix operating system with the #Bosch IoT Suite in ways that will make complementary software services from each available on the other.
The work will take place in several existing open-source projects under the #Eclipse Foundation. These projects are creating code for things like messaging, user authentication, access control and device descriptions. Through the Eclipse projects, other vendors also will be able to create software services that are compatible with Predix and Bosch IoT Suite, said Greg Petroff, executive director of platform evangelism at GE Software.

If enterprises can draw on a broader set of software components that work together, they may look into doing things with IoT that they would not have considered otherwise, he said. These could include linking IoT data to ERP or changing their business model from one-time sales to subscriptions.
GE and Bosch will keep the core parts of Predix and IoT Suite unique and closed, Petroff said. In the case of Predix, for example, that includes security components. The open-source IoT stack will handle fundamental functions like messaging and how to connect to IoT data.
Partnerships and open-source software both are playing important roles in how IoT takes shape amid expectations of rapid growth in demand that vendors want to be able to serve. Recently, IBM joined with Cisco Systems to make elements of its Watson analytics available on Cisco IoT edge computing devices. Many of the common tools and specifications designed to make different IoT devices work together are being developed in an open-source context.

GitLab hits back on GitHub with a visual issue tracking board!

By Ramkumar Sundarakalatharan | August 23, 2016 | Comments 0 Comment

The startup today introduced an issue tracking feature for its fast-growing code hosting platform that promises to help development teams organize the features, enhancements and other items on their to-do lists more effectively.

The GitLab Issue Board is a sleek graphical panel that provides the ability to display tasks as digital note cards and sort them into neat columns each representing a different part of the application lifecycle. By default, new panels start only with a “Backlog” section for items still in the queue and a “Done” list that shows completed tasks, but users are able to easily add more tabs if necessary. GitLab says that the tool makes it possible to break up a view into as many as 10 different segments if need be, which should be enough for even the most complex software projects.

An enterprise development team working on an internal client-server service, for instance, could create separate sections to hold backend tasks, issues related to the workload’s desktop client and user experience bugs. Users with such crowded boards can also take advantage of GitLab’s built in tagging mechanism to label each item with color-coded tags denoting its purpose. The feature not only helps quickly make sense of the cards on a given board but also makes it easier to find specific items in the process. When an engineer wants to check if there are new bug fix requests concerning their part of a project, they can simply filter the board view based on the appropriate tags.

Google makes its TensorFlow artificial intelligence platform available on iOS

By Ramkumar Sundarakalatharan | June 8, 2016 | Comments 0 Comment

Google this week has published a new version of its TensorFlow machine learning software that adds support for iOS. Google initially teased that it was working on iOS support for TensorFlow last November, but said it was unable to give a timeline. An early version of TensorFlow version 0.9 was released yesterday on GitHub, however, and it brings iOS support.
For those unfamiliar, TensorFlow is Google’s incredibly powerful artificial intelligence software that powers many of Google’s services and initiatives, including AlphaGo. Google describes TensorFlow as “neural network” software that processes data in a way that’s similar how our brain cells process data (via CNET).
With Google adding iOS support to TensorFlow, apps will be able to integrate the smarter neural network capabilities into their apps, ultimately making them considerably smarter and capable.
At this point, it’s unclear when the final version of TensorFlow 0.9 will be released, but the early pre-release version is available now on GitHub. In the release notes, Google points out that because TensorFlow is now open source, 46 people from outside the company contributed to TensorFlow version 0.9.
In addition to adding support for iOS, TensorFlow 0.9 adds a handful of other new features and improvements, as well as plenty of smaller bug fixes and performance enhancements. You can read the full change log below and access TensorFlow on GitHub.

Major Features and Improvements

Python 3.5 support and binaries
Added iOS support
Added support for processing on GPUs on MacOS
Added makefile for better cross-platform build support (C API only)
fp16 support for many ops
Higher level functionality in contrib.{layers,losses,metrics,learn}
More features to Tensorboard
Improved support for string embedding and sparse features
TensorBoard now has an Audio Dashboard, with associated audio summaries.

Big Fixes and Other Changes

Turned on CuDNN Autotune.
Added support for using third-party Python optimization algorithms (contrib.opt).
Google Cloud Storage filesystem support.
HDF5 support
Add support for 3d convolutions and pooling.
Update gRPC release to 0.14.
Eigen version upgrade.
Switch to eigen thread pool
tf.nn.moments() now accepts a shift argument. Shifting by a good estimate of the mean improves numerical stability. Also changes the behavior of the shift argument to tf.nn.sufficient_statistics().
Performance improvements
Many bugfixes
Many documentation fixes
TensorBoard fixes: graphs with only one data point, Nan values, reload button and auto-reload, tooltips in scalar charts, run filtering, stable colors
Tensorboard graph visualizer now supports run metadata. Clicking on nodes while viewing a stats for a particular run will show runtime statistics, such as memory or compute usage. Unused nodes will be faded out.