software engineering - Nocturnalknight's Lair

How To Measure Real Success In Software Engineering

By Ramkumar Sundarakalatharan | December 12, 2024 | Comments 0 Comment

Recently, while attending The Business Show in London, I engaged in a conversation with a CXO of an upcoming Fintech company. The discussion began with cybersecurity implementation—a topic close to my heart—but quickly veered into the realm of engineering throughput. What followed was an incoherent rant by the CXO, a frustrating narrative about firing their Delivery Director for refusing to scale the engineering team to meet deadlines for the company’s next shiny event. Despite my best efforts to pull this gentleman out of his rabbit hole, my time and reasoning seemed to fall on deaf ears.

Reflecting on this interaction over the past month, I’ve realized this episode was emblematic of a larger issue: the prevalent fallacy among CXOs that more engineers equals faster and better output. Surprisingly, this misconception thrives in part because of the silence of engineering leaders—CTOs, VPs, and Directors of Engineering—who often fail to push back against flawed assumptions at the executive level.

Inspired by my recent association with the Information Security Group (ISG) at the Royal Holloway University of London, I decided to don my “academic specs” and examine this fallacy more critically. The result is a deeper dive into the myths of scaling engineering teams, the science behind team efficiency, and a call for a cultural shift in how organizations measure productivity.

The Scaling Myth: Why More Isn’t Always Better

At the heart of this fallacy is a simplistic assumption: more engineers means more features, delivered faster. While this notion seems logical, it is disproven by Price’s Law, a principle that exposes the diminishing returns of team scaling.

Rediscovering Derek J. de Solla Price: From Antikythera to Engineering Efficiency

My journey to understanding Price’s Law began with a fascination for the Antikythera Mechanism —an ancient Greek marvel of engineering and astronomy. It was through this mechanism that I first encountered the work of Prof. Derek J. de Solla Price, a British physicist and historian whose curiosity and intellect extended far beyond antiquities. Inspired by the ingenuity of the Antikythera Mechanism, I was drawn to explore the origins of Damascus and Wootz steel, and its roots in the south-western peninsula of India (as detailed in Aayutha Desam by R. Mannar Mannan). (More about that in another post!)

But it was Price’s insight into the uneven distribution of productivity in groups that struck a chord with my work in software engineering. His principle, now widely known as Price’s Law, asserts that in any team, 50% of the work is accomplished by the square root of the total number of participants.

In a team of 10 engineers, approximately 3 contributors (√10) are responsible for half the output.
In a team of 100 engineers, only 10 individuals (√100) produce as much as the remaining 90 combined.

This principle highlights a counterintuitive but vital truth: as team size grows, the proportion of high contributors decreases, leading to inefficiencies that compound over time. This isn’t just an academic curiosity—it’s a critical insight for engineering leaders tasked with scaling teams and delivering results.

Price’s Law challenges a long-standing assumption in engineering leadership: that scaling teams proportionally scales productivity. By understanding this principle, CTOs, VPs, and engineering managers can rethink strategies for achieving efficiency and delivering value, even with constrained resources.

The Myth of Highly Motivated Teams

Some self-proclaimed visionary leaders advocate for hiring only highly motivated individuals, often overlooking how teams function in practice. In any organized group, work typically falls into three categories:

Drudgery Work (Low Impact, High Intensity): Routine tasks like debugging or documentation, essential but unappealing.
Intermediate Work (Medium Impact, Medium Intensity): Feature upgrades or system integrations, vital for sustaining operations.
Challenging Work (High Impact, High Intensity): Complex, high-stakes initiatives that highly motivated individuals prefer.

The Problem

Highly motivated individuals often prioritize high-impact projects, leaving routine and intermediate work neglected. This creates:

Operational Bottlenecks: Accumulating technical debt and system fragility.
Imbalanced Workloads: Overburdened team members handling routine tasks.
Team Friction: Reduced cohesion and potential burnout.

The Solution: Balance Over Ambition

Effective teams thrive on diversity in skill sets and balanced task allocation. Leaders must:

Distribute Work Strategically: Ensure all types of work are addressed.
Value Contributions Equally: Recognize the importance of routine and intermediate tasks.
Foster Team Cohesion: Avoid over-prioritizing high-stakes projects at the expense of operational stability.

Conclusion: A truly visionary leader grounds ambition in pragmatism, creating teams that excel not just in high-impact projects but also in sustaining the essentials of day-to-day operations.

Implications for Team Expansion

For CTOs, VPs, and engineering managers, this dynamic presents a counterintuitive challenge: merely expanding the team does not guarantee proportional gains in productivity. Doubling headcount often introduces:

Communication Overhead: Larger teams require more coordination, which consumes valuable time and resources.
Dilution of Accountability: As teams grow, individual contributions become harder to track, potentially reducing ownership and engagement.
Coordination Complexities: Increased interdependencies among team members can slow down decision-making and implementation.

To achieve a twofold increase in productivity, Price’s Law suggests that you may need to quadruple the team size, a move that is often impractical and financially untenable. Instead, engineering leaders must rethink productivity beyond the simplistic metric of team size.

Shifting Focus: Outcomes Over Outputs

Traditional productivity metrics, such as the number of features released or lines of code written, focus on outputs—tangible deliverables produced by the team. However, outputs do not inherently translate into value. Consider the distinction:

Outputs: Metrics like features delivered or tickets closed.
Outcomes: Measurable changes in user behaviour that drive business results, such as increased user retention or reduced churn.

Relying solely on outputs creates a misleading picture of productivity. A feature-rich application that fails to address user needs or business goals is ultimately unproductive. Instead, outcomes—which capture the real-world effectiveness of engineering efforts—offer a better lens to measure success.

Outcome vs. Impact

While outcomes focus on immediate effects (e.g., increased sign-ups from a new feature), impact delves deeper into long-term consequences. For example:

An outcome may be an increase in user sign-ups after a feature launch.
The impact would be sustained revenue growth and user satisfaction resulting from the feature’s value over time.

Engineering teams must aim for outcomes that align with strategic goals while keeping an eye on their long-term impacts.

Counterproductive Paradigm: The Threat Surface of Excessive Outputs

Emphasizing outputs over outcomes can be counterproductive, leading to what can be described as an expanding threat surface:

Defects and Bugs: Adding more features often introduce unintended issues that require additional resources to resolve.
Maintenance Burden: More code increases the risk of technical debt, making future development slower and more complex.
Conflict Resolution: Larger teams fixing bugs or implementing features in parallel can inadvertently cause regressions, especially when the main sprint continues uninterrupted.

This vicious cycle diverts focus from strategic initiatives, tying up engineers in a continuous loop of fixes. Instead of scaling output indiscriminately, teams should focus on ensuring that every deliverable contributes to meaningful outcomes.

Focusing on Impacts and Outcomes: A Leadership Imperative

For engineering leaders, the shift from outputs to impacts and outcomes is transformative. This approach emphasizes:

Defining Clear Objectives: Establish measurable outcomes (e.g., reducing churn by 10%) that align with business goals.
Prioritizing High-Impact Work: Evaluate tasks based on their potential to deliver meaningful results.
Empowering Teams: Foster a culture where engineers understand and contribute to broader business objectives rather than just completing tickets.
Continuous Feedback Loops: Regularly assess whether engineering efforts are driving intended outcomes.

This shift not only enhances productivity but also aligns engineering work with the organization’s mission, fostering a sense of purpose within teams.

Conclusion: Redefining Productivity in Software Engineering

Price’s Law reminds us that productivity does not scale linearly with team size. Engineering leaders must navigate this reality by focusing on outcomes and impacts rather than outputs. This paradigm shift requires a cultural and strategic overhaul, but the rewards—greater efficiency, alignment, and value delivery—are well worth the effort.

By embracing this approach, organizations can ensure that their engineering efforts contribute directly to their strategic goals, transforming software development into a driver of sustainable business success.

References

Sundarakalatharan, R. (2022). How to measure Engineering Productivity?. Retrieved from https://nocturnalknight.co/how-to-measure-engineering-productivity/
Bohrmann, N. (2022). How Price’s Law Applies to Everything. Retrieved from https://nielsbohrmann.com/prices-law/
LeadDev. (2022). Focus on outcomes over outputs. Retrieved from https://leaddev.com/velocity/focus-outcomes-over-outputs
Monday Mornings. (2023). Productivity and Price’s Law. Retrieved from https://mondaymornings.madisoncres.com/productivity-and-prices-law-1
TechRadar. (2023). Outcomes versus outputs: the real measure of developer productivity. Retrieved from https://www.techradar.com/pro/outcomes-versus-outputs-the-real-measure-of-developer-productivity
Royal Holloway Information Security Group. (2024). https://pure.royalholloway.ac.uk/
Wikipedia. (2024). Antikythera Mechanism. Retrieved from https://en.wikipedia.org/wiki/Antikythera_mechanism
Wikipedia. (2024). Derek J. de Solla Price. Retrieved from https://en.wikipedia.org/wiki/Derek_J._de_Solla_Price
Wikipedia. (2024). Wootz Steel. Retrieved from https://en.wikipedia.org/wiki/Wootz_steel
Purple Book House. (2024). Aayutha Desam by R. Mannar Mannan. Retrieved from https://www.purplebookhouse.co.uk/product-page/aayutha-desam-book-type-katturaigal-history-by-r-mannar-mannan

The Truth About “Ghost Engineers”: A Critical Analysis

By Ramkumar Sundarakalatharan | December 7, 2024 | Comments 0 Comment

Disclaimer:
This article is not intended to discredit Boris Denisov, Stanford University, McKinsey, or any other entities referenced herein. I hold immense respect for their contributions to research and industry discourse. While findings like these may resonate with practices in FAANG companies, large organizations, and mature startups, this critique seeks to explore the broader implications of relying on narrow metrics to evaluate productivity in software engineering.

The “Ghost Engineer” Narrative

The term “ghost engineers,” popularized by a recent Stanford study, describes software engineers who allegedly contribute minimally to codebases. Analyzing data from over 50,000 engineers, the study concludes that 9.5% of engineers fall into this category, with the prevalence rising to 14% among remote workers.

While the findings spark interesting discussions, they rely heavily on the flawed assumption that code commit frequency equates to productivity. As I argued in No, McKinsey, You Got It All Wrong About Developer Productivity, this narrow perspective risks undervaluing critical aspects of software engineering that don’t leave a visible footprint in version control systems.

Unintended Amplification: The Snowball Effect

One of the most significant risks of such conclusions—especially before peer review—is their unintended amplification. Articles on Yahoo, TechCrunch, and Newsday have already simplified these findings, creating narratives that could ripple through the industry:

Unnecessary Layoffs: Misinterpreting data might lead organizations to hastily classify engineers as unproductive, ignoring less visible but valuable contributions.
Remote Work Stigma: By associating remote work with reduced productivity, these claims risk undermining one of the most effective workforce models when well-managed.
Toxic Metrics Culture: Over-reliance on activity metrics like commit counts can encourage engineers to game the system by prioritizing volume over meaningful work, as discussed in Business Value Delivery by Engineering Teams in Startups (Part 2).

History offers cautionary examples, such as McKinsey’s controversial reliance on lines of code as a productivity measure—a practice criticized in my earlier article for ignoring the multifaceted nature of modern software engineering.

Engineering Productivity: Beyond Output Metrics

As outlined in Is the Myth of a 10x Developer Real?, productivity in software engineering extends far beyond raw output. Effective engineers don’t just code—they align stakeholders, resolve ambiguity, and reduce future risks. These invisible contributions often lead to:

Improved Collaboration: Engineers who mentor, review code, or resolve cross-team dependencies amplify the impact of their teams.
Strategic Outcomes: Refactoring technical debt or implementing security frameworks might reduce visible code output while significantly improving system health.

Commit Frequency Misses Critical Context

Quality Over Quantity: A single commit that eliminates 1,000 lines of redundant code can be more impactful than 10 minor feature updates.
Diverse Roles: Roles like DevOps, QA, and security often contribute indirectly to engineering success but rarely generate frequent commits.

By focusing solely on visible metrics, we risk reinforcing flawed incentives, a point I emphasized in Business Value Delivery by Engineering Teams in Startups (Part 1).

Analyzing the Stanford Study’s Claims

Claim 1: Engineers with Low Commit Activity Are Unproductive

Rebuttal: This assumption ignores the cognitive and collaborative aspects of engineering. As noted in No, McKinsey, You Got It All Wrong About Developer Productivity, activities like design discussions, documentation, and mentoring are essential but invisible in commit logs.

Claim 2: Remote Engineers Are More Likely to Be “Ghost Engineers”

Rebuttal: Remote work relies on asynchronous collaboration, where documentation and long-term planning take precedence over immediate outputs. Simplistic comparisons risk stigmatizing effective remote models.

Claim 3: Low Commit Activity Correlates with Poor Team Performance

Rebuttal: High-performing teams often include specialists whose contributions are less visible but critical. For example, a security engineer resolving vulnerabilities or a DevOps engineer optimizing CI/CD pipelines may not show up in commit logs.

Claim 4: Organizations Could Save Billions by Addressing the “Ghost Engineer” Problem

Rebuttal: Cost-cutting measures based on flawed metrics often lead to higher technical debt, increased turnover, and diminished morale. As argued in Business Value Delivery by Engineering Teams in Startups (Part 2), true cost efficiency lies in maximizing impact, not minimizing headcount.

Impact vs Code-Commits: Understanding the Misalignment

A recurring issue with productivity metrics like code-commit frequency is their inability to reflect the true impact of an engineer’s work. The volume of code changes often says little about the value delivered, as demonstrated by the following examples:

Example 1: A Cosmetic UI Change vs. A Critical API Update

Imagine a product manager requests a seemingly simple change: update a button’s color from purple to orange. While this may sound trivial, it could involve:

Updating CSS libraries: A cascade of dependencies might require 1,000+ lines of revisions.
Testing for accessibility: Ensuring compliance with color-contrast guidelines adds complexity.
Regression testing: Updating snapshot tests or fixing broken visual diffs.

This cosmetic change could result in dozens of commits, each addressing a specific dependency or edge case.

Contrast this with a backend engineer’s work on the API gateway to improve application concurrency. This might involve:

Identifying bottlenecks: Profiling existing workloads and implementing a solution to reduce latency.
Optimizing database connections: Reducing round trips or improving query performance.
Deploying with minimal disruption: A single, concise commit could encapsulate weeks of planning and testing.

Here, the backend change’s impact far outweighs the UI update, even though it appears smaller in terms of commit frequency.

Example 2: Bulk Refactoring vs. Precise Bug Fixing

A mid-level engineer is tasked with refactoring a legacy module, updating deprecated methods, and restructuring a monolithic codebase for better readability. This effort generates hundreds of commits and thousands of lines of changes, none of which immediately improve the product’s features.

On the other hand, a senior engineer identifies and fixes a critical bug that intermittently crashes the application. The solution, a one-line code change after hours of debugging, resolves a high-severity issue affecting thousands of users.

From a commit-count perspective, the refactoring task appears more productive. However, the senior engineer’s single-line fix has a far greater immediate impact.

Example 3: Feature Addition vs. Security Enhancement

A frontend developer introduces a new feature, such as a user profile editor. This entails:

New UI components: HTML and CSS for the form.
Frontend validations: JavaScript-based constraints for data inputs.
Integration tests: Mock API responses for various test cases.

The addition spans 2,000 lines of code across 20 commits.

Meanwhile, a DevSecOps engineer works on a critical security vulnerability. The task involves:

Rotating access tokens: Updating key secrets stored in the CI/CD pipeline.
Implementing security headers: Adding CSPs to prevent XSS attacks.
Hardening configurations: Minor changes in deployment scripts to reduce attack surfaces.

Although the security enhancement generates fewer than 10 commits, its value in preventing potential breaches and compliance penalties is enormous.

Key Takeaways

Context Matters: Evaluating productivity requires understanding the context and complexity of the task, not just the output volume.
Quality Over Quantity: High-impact changes often involve fewer commits, while low-value tasks may inflate commit counts.
Recognizing Diverse Contributions: Engineers working on performance, security, or architecture frequently produce less visible yet highly impactful work.

This misalignment underscores the need for organizations to adopt holistic evaluation metrics that consider both quantitative output and qualitative impact. By focusing on the latter, teams can better recognize and reward meaningful contributions.

The Danger of Flawed Productivity Metrics

Simplistic metrics can have cascading negative effects:

Burnout: Engineers may feel pressured to prioritize activity over quality.
Stifled Innovation: Overemphasis on visible output discourages experimentation and risk-taking.
Loss of Talent: Talented engineers in specialized roles may leave if their contributions are undervalued.

As emphasized in Is the Myth of a 10x Developer Real?, effective engineering is about multiplying impact, not maximizing visible output.

A Holistic Approach to Productivity

To address these issues, organizations must adopt nuanced evaluation frameworks:

Impact-Driven Metrics: Evaluate contributions based on outcomes, such as improved system reliability or customer satisfaction.
Recognize Invisible Work: Acknowledge tasks like mentorship, technical debt reduction, and long-term strategic planning.
Foster a Culture of Trust: Empower teams to experiment and innovate without fear of being misjudged by flawed metrics.

Conclusion

The “ghost engineer” narrative oversimplifies the multifaceted nature of software engineering. By relying on metrics like commit counts, it risks undervaluing critical contributions and fostering unhealthy workplace dynamics. As I’ve argued across multiple articles, effective engineering teams succeed by delivering value, not just output. The industry must move beyond flawed productivity metrics and adopt more comprehensive frameworks to recognize the true contributions of every engineer.

References and Further Reading

Denisov-Blanch, Y. (2024). Twitter Thread on Ghost Engineers. Retrieved from link.
Denisov-Blanch, Y. (2024). Stanford Research on Software Engineering Productivity. Stanford University. Retrieved from link.
Polyakov, A. (2024). Ghost Engineers—Utter Non-Sense! Medium. Retrieved from link.
No, McKinsey, You Got It All Wrong About Developer Productivity. Nocturnalknight.co. Retrieved from link.
Is the Myth of a 10x Developer Real? Nocturnalknight.co. Retrieved from link.
Bridgwater, A. (2024). Code Busters: Are Ghost Engineers Haunting DevOps Productivity? DevOps.com. Retrieved from link.
Business Value Delivery by Engineering Teams in Startups (Part 1). Nocturnalknight.co. Retrieved from link.
Business Value Delivery by Engineering Teams in Startups (Part 2). Nocturnalknight.co. Retrieved from link.
Long, K. (2024). Are Ghost Engineers Undermining Tech Productivity? Business Insider. Retrieved from link.
Passionate Geekz. (2024). Can a Company Increase Its Market Value by Laying Off Employees? Retrieved from link.

Do You Know What’s in Your Supply Chain? The Case for Better Security

By Ramkumar Sundarakalatharan | December 2, 2024 | Comments 0 Comment

I recently read an interesting report by CyCognito on the top 3 vulnerabilities on third-party products and it sparked my interest to reexamine the supply chain risks in software engineering. This article is an attempt at that.

The Vulnerability Trifecta in Third-Party Products

The CyCognito report identifies three critical areas where third-party products introduce significant vulnerabilities:

Web Servers
These foundational systems host countless applications but are frequently exploited due to misconfigurations or outdated software. According to the report, 34% of severe security issues are tied to web server environments like Apache, NGINX, and Microsoft IIS. Vulnerabilities like directory traversal or improper access control can serve as gateways for attackers.
Cryptographic Protocols
Secure communication relies on cryptographic protocols like TLS and HTTPS. Yet, 15% of severe vulnerabilities target these mechanisms. For instance, misconfigurations, weak ciphers, or reliance on deprecated standards expose sensitive data, with inadequate encryption ranking second on OWASP’s Top 10 security threats.
Web Interfaces Handling PII
Applications that process PII—such as invoices or financial statements—are among the most sensitive assets. Alarmingly, only half of such interfaces are protected by Web Application Firewalls (WAFs), leaving them vulnerable to injection attacks, session hijacking, or data leakage.

Beyond Web Servers: The Hidden Dependency Risks

You control your software stack, but do you actually know what runs beneath those flashy Web/Application servers?

Drawing parallels from my previous article on PyPI and NPM vulnerabilities, it’s clear that open-source dependencies amplify these threats. Attackers exploit the very trust inherent in supply chains, introducing malicious packages or exploiting insecure libraries.

For example:

Attackers have embedded malware into popular NPM and PyPI packages, which are then unknowingly incorporated into enterprise-grade software.
Dependency confusion attacks exploit naming conventions to inject malicious packages into CI/CD pipelines.

These risks share a core vulnerability with traditional third-party systems: an opaque supply chain with minimal oversight. This is compounded by the ever-decreasing cycle-times for each software releases, giving little to no time for even great Software Engineering teams to doa decent audit and look into the dependency graph of the packages they are building their new, shiny/pointy things that is to transform the world.

Why Software Supply Chain Attacks Persist

As highlighted by Scientific Computing World, software supply chain attacks persist for several reasons:

Aggressive GTM Timelines: Most organisations now run quarterly or even monthly product roadmaps, so it is possible to launch a new SaaS product in a matter of days to weeks by leveraging other IaaS, PaaS or SaaS systems – in addition to any Libraries, frameworks and other constructs.
Exponential Complexity: With organisations relying on layers of third-party and fourth-party services, the attack surface expands exponentially.
Insufficient Oversight: Organisations often focus on securing their environments while neglecting the vendors and libraries they depend on.
Lagging Standards: The industry’s inability to enforce stringent security protocols across the supply chain leaves critical gaps.
Sophistication of Attacks: From SolarWinds to MOVEit, attackers continually evolve, targeting blind spots in detection and remediation frameworks.

Recommended Steps to Mitigate Supply Chain Threats

To address these vulnerabilities and build resilience, organizations can take the following actionable steps:

1. Map and Assess Dependencies

Use tools like Dependency-Track or Sonatype Nexus to map and analyze all third-party and open-source dependencies.
Regularly perform software composition analysis (SCA) to detect outdated or vulnerable components.

2. Implement Zero-Trust Architecture

Leverage Zero-Trust frameworks like NIST 800-207 to ensure strict authentication and access controls across all systems.
Minimize the privileges of third-party integrations and isolate sensitive data wherever possible.

3. Strengthen Vendor Management

Evaluate vendor security practices using frameworks like the NCSC’s Supply Chain Security Principles or the Open Trusted Technology Provider Standard (OTTPS).
Demand transparency through detailed Service Level Agreements (SLAs) and regular vendor audits.

4. Prioritize Secure Development and Deployment

Train your development teams to follow secure coding practices like those outlined in the OWASP Secure Coding Guidelines.
Incorporate tools like Snyk or Checkmarx to identify vulnerabilities during the software development lifecycle.

5. Enhance Monitoring and Incident Response

Deploy Web Application Firewalls (WAFs) such as AWS WAF or Cloudflare to protect web interfaces.
Establish a robust incident response plan using guidance from the MITRE ATT&CK Framework to ensure rapid containment and mitigation.

6. Foster Collaboration

Work with industry peers and organizations like the Cybersecurity and Infrastructure Security Agency (CISA) to share intelligence and best practices for supply chain security.
Collaborate with academic institutions and research groups for cutting-edge insights into emerging threats.

7. Schedule a No-Obligation Consultation Call with Yours Truly

Struggling with supply chain vulnerabilities or need tailored solutions for your unique challenges? I offer consultation services to work directly with your CTO, Principal Architect, or Security Leadership team to:

Assess your systems and identify key risks.
Recommend actionable, budget-friendly steps for mitigation and prevention.

With years of expertise in cybersecurity and compliance, I can help streamline your approach to supply chain security without breaking the bank. Let’s collaborate to make your operations secure and resilient.

Schedule Your Free Consultation Today

Building a Resilient Supply Chain

The UK’s National Cyber Security Centre (NCSC) principles for supply chain security provide a pragmatic roadmap for businesses. Here’s how to act:

Understand and Map Dependencies
Organizations should create a detailed map of all dependencies, including direct vendors and downstream providers, to identify potential weak links.
Adopt a Zero-Trust Framework
Treat every external connection as untrusted until verified, with continuous monitoring and access restrictions.
Mandate Secure Development Practices
Encourage or require vendors to implement secure coding standards, frequent vulnerability testing, and robust update mechanisms.
Regularly Audit Supply Chains
Establish a routine audit process to assess vendor security posture and adherence to compliance requirements.
Proactive Incident Response Planning
Prepare for the inevitable by maintaining a robust incident response plan that incorporates supply chain risks.

Final Thoughts

The threat of supply chain vulnerabilities is no longer hypothetical—it’s happening now. With reports like CyCognito’s, research into dependency management, and frameworks provided by trusted institutions, businesses have the tools to mitigate risks. However, this requires vigilance, collaboration, and a willingness to rethink traditional approaches to third-party management.

Organisations must act not only to safeguard their operations but also to preserve trust in an increasingly interconnected world.

Is your supply chain ready to withstand the next wave of attacks?

References and Further Reading

What’s your strategy for managing third-party risks? Share your thoughts in the comments!

Hidden Threats in PyPI and NPM: What You Need to Know

By Ramkumar Sundarakalatharan | November 9, 2024 | Comments 0 Comment

Introduction: Dependency Dangers in the Developer Ecosystem

Modern software development is fuelled by open-source packages, ranging from Python (PyPI) and JavaScript (npm) to PHP (phar) and pip modules. These packages have revolutionised development cycles by providing reusable components, thereby accelerating productivity and creating a rich ecosystem for innovation. However, this very reliance comes with a significant security risk: these widely used packages have become an attractive target for cybercriminals. As developers seek to expedite the development process, they may overlook the necessary due diligence on third-party packages, opening the door to potential security breaches.

Faster Development, Shorter Diligence: A Security Conundrum

Today, shorter development cycles and agile methodologies demand speed and flexibility. Continuous Integration/Continuous Deployment (CI/CD) pipelines encourage rapid iterations and frequent releases, leaving little time for the verification of every dependency. The result? Developers often choose dependencies without conducting rigorous checks on package integrity or legitimacy. This environment creates an opening for attackers to distribute malicious packages by leveraging popular repositories such as PyPI, npm, and others, making them vectors for harmful payloads and information theft.

Malicious Package Techniques: A Deeper Dive

While typosquatting is a common technique used by attackers, there are several other methods employed to distribute malicious packages:

Supply Chain Attacks: Attackers compromise legitimate packages by gaining access to the repository or the maintainer’s account. Once access is obtained, they inject malicious code into trusted packages, which then get distributed to unsuspecting users.
Dependency Confusion: This technique involves uploading packages with names identical to internal, private dependencies used by companies. When developers inadvertently pull from the public repository instead of their internal one, they introduce malicious code into their projects. This method exploits the default behaviour of package managers prioritising public over private packages.
Malicious Code Injection: Attackers often inject harmful scripts directly into a package’s source code. This can be done by compromising a developer’s environment or using compromised libraries as dependencies, allowing attackers to spread the malicious payload to all users of that package.

These methods are increasingly sophisticated, leveraging the natural behaviours of developers and package management systems to spread malicious code, steal sensitive information, or compromise entire systems.

Timeline of Incidents: Malicious Packages in the Spotlight

A series of high-profile incidents have demonstrated the vulnerabilities inherent in unchecked package installations:

June 2022: Malicious Python packages such as loglib-modules, pyg-modules, pygrata, pygrata-utils, and hkg-sol-utils were caught exfiltrating AWS credentials and sensitive developer information to unsecured endpoints. These packages were disguised to look like legitimate tools and fooled many unsuspecting developers. (BleepingComputer)
December 2022: A malicious package masquerading as a SentinelOne SDK was uploaded to PyPI, with malware designed to exfiltrate sensitive data from infected systems. (The Register)
January 2023: The popular ctx package was compromised to steal environment variables, including AWS keys, and send them to a remote server. This instance affected many developers and highlighted the scale of potential data leakage through dependencies. (BleepingComputer)
September 2023: An extended campaign involving malicious npm and PyPI packages targeted developers to steal SSH keys, AWS credentials, and other sensitive information, affecting numerous projects globally. (BleepingComputer)
October 2023: The recent incident involving the fabrice package is a stark reminder of how easy it is for attackers to deceive developers. The fabrice package, designed to mimic the legitimate fabric library, employed a typosquatting strategy, exploiting typographical errors to infiltrate systems. Since its release, the package was downloaded over 37,000 times and covertly collected AWS credentials using the boto3 library, transmitting the stolen data to a remote server via VPN, thereby obscuring the true origin of the attack. The package contained different payloads for Linux and Windows systems, utilising scheduled tasks and hidden directories to establish persistence. (Developer-Tech)

The Impact: Scope of Compromise

The estimated number of affected companies and products is difficult to pin down precisely due to the widespread usage of open-source packages in both small-scale and enterprise-level applications. Given that some of these malicious packages garnered tens of thousands of downloads, the potential damage stretches across countless software projects. With popular packages like ctx and others reaching a substantial audience, the economic and reputational impact could be significant, potentially costing affected businesses millions in breach recovery and remediation costs.

Real-world Impact: Consequences of Malicious Packages

The real-world impact of malicious packages is profound, with consequences ranging from data breaches to financial loss and severe reputational damage. The following are some of the key impacts:

British Airways and Ticketmaster Data Breach: In 2018, the Magecart group exploited vulnerabilities in third-party scripts used by British Airways and Ticketmaster. The attackers injected malicious code to skim payment details of customers, leading to significant data breaches and financial loss. British Airways was fined £20 million for the breach, which affected over 400,000 customers. (BBC)
Codecov Bash Uploader Incident: In April 2021, Codecov, a popular code coverage tool, was compromised. Attackers modified the Bash Uploader script, which is used to send coverage reports, to collect sensitive information from Codecov’s users, including credentials, tokens, and keys. This supply chain attack impacted hundreds of customers, including notable companies like HashiCorp. (GitGuardian)
Event-Stream NPM Package Attack: In 2018, a popular JavaScript library event-stream was hijacked by a malicious actor who added code to steal cryptocurrency from applications using the library. The compromised version was downloaded millions of times before the attack was detected, affecting numerous developers and projects globally. (Synk)

These incidents highlight the potential repercussions of malicious packages, including severe financial penalties, reputational damage, and the theft of sensitive customer information.

Fabrice: A Case Study in Typosquatting

The recent incident involving the fabrice package is a stark reminder of how easy it is for attackers to deceive developers. The fabrice package, designed to mimic the legitimate fabric library, employed a typosquatting strategy, exploiting typographical errors to infiltrate systems. Since its release, the package was downloaded over 37,000 times and covertly collected AWS credentials using the boto3 library, transmitting the stolen data to a remote server via VPN, thereby obscuring the true origin of the attack. The package contained different payloads for Linux and Windows systems, utilising scheduled tasks and hidden directories to establish persistence. (Developer-Tech)

Lessons Learned: Importance of Proactive Security Measures

The cases highlighted in this article offer important lessons for developers and organisations:

Dependency Verification is Crucial: Typosquatting and dependency confusion can be avoided by carefully verifying package authenticity. Implementing strict naming conventions and utilising internal package repositories can help prevent these attacks.
Security Throughout the SDLC: Integrating security checks into every phase of the SDLC, including automated code reviews and security testing of modules, is essential. This ensures that vulnerabilities are identified early and mitigated before reaching production.
Use of Vulnerability Scanning Tools: Tools like Snyk and OWASP Dependency-Check are invaluable in proactively identifying vulnerabilities. Organisations should make these tools a mandatory part of the development process to mitigate risks from third-party dependencies.
Security Training and Awareness: Developers must be educated about the risks associated with third-party packages and taught how to identify potentially malicious code. Regular training can significantly reduce the likelihood of falling victim to these attacks.

By recognising these lessons, developers and organisations can better safeguard their software supply chains and mitigate the risks associated with third-party dependencies.

Prevention Strategies: Staying Safe from Malicious Packages

To mitigate the risks associated with malicious packages, developers and startups must adopt a multi-layered defence approach:

Verify Package Authenticity: Always verify package names, descriptions, and maintainers. Opt for well-reviewed and frequently updated packages over relatively unknown ones.
Review Source Code: Whenever possible, review the source code of the package, especially for dependencies with recent uploads or unknown maintainers.
Use Package Scanners: Employ tools like Sonatype Nexus, npm audit, or PyUp to identify vulnerabilities and malicious code within packages.
Leverage Lockfiles: Tools like package-lock.json (npm) or Pipfile.lock (pip) can help prevent unintended updates by locking dependencies to a specific version.
Implement Least Privilege: Limit the permissions assigned to development environments to reduce the impact of compromised keys or credentials.
Regular Audits: Conduct regular security audits of dependencies as part of the CI/CD pipeline to minimise risk.

Software Security: Embedding Security in the Development Lifecycle

To mitigate the risks associated with malicious packages and other vulnerabilities, it is essential to integrate security into every phase of the Software Development Lifecycle (SDLC). This practice, known as the Secure Software Development Lifecycle (SSDLC), emphasises incorporating security best practices throughout the development process.

Key Components of SSDLC

Automated Code Reviews: Leveraging tools that automatically scan code for vulnerabilities and flag potential issues early in the development cycle can significantly reduce the risk of security flaws making it into production. Tools like SonarQube, Checkmarx, and Veracode help in ensuring that security is built into the code from the beginning.
Security Testing of Modules: Security testing should be conducted on third-party modules before integrating them into the project. Tools like Snyk and OWASP Dependency-Check can identify vulnerabilities in dependencies and provide remediation advice.

Deep Dive into Technical Details

Malicious Package Techniques: As discussed earlier, typosquatting is just one of the many attack techniques. Supply chain attacks, dependency confusion, and malicious code injection are also common methods attackers use to compromise software projects. It is essential to understand these techniques and incorporate checks that can prevent such attacks during the development process.
Vulnerability Analysis Tools:
- Snyk: Snyk helps developers identify vulnerabilities in open-source libraries and container images. It scans the project dependencies and cross-references them with a constantly updated vulnerability database. Once vulnerabilities are identified, Snyk provides detailed remediation advice, including fixing the version or applying patches.
- OWASP Dependency-Check: OWASP Dependency-Check is an open-source tool that scans project dependencies for known vulnerabilities. It works by identifying the libraries used in the project, then checking them against the National Vulnerability Database (NVD) to highlight potential risks. The tool also provides reports and actionable insights to help developers remediate the issues.
- Sonatype Nexus: Sonatype Nexus offers a repository management system that integrates directly with CI/CD pipelines to scan for vulnerabilities. It uses machine learning and other advanced techniques to continuously monitor and evaluate open-source libraries, providing alerts and remediation options.

Best Practices for Secure Dependency Management

Dependency Pinning: Pinning dependencies to specific versions helps in preventing unexpected updates that may contain vulnerabilities. By using tools like package-lock.json (npm) or Pipfile.lock (pip), developers can ensure that they are not inadvertently upgrading to a compromised version of a dependency.
Use of Private Registries: Hosting private package registries allows organisations to maintain tighter control over the dependencies used in their projects. By using tools like Nexus Repository or Artifactory, companies can create a trusted repository of dependencies and mitigate risks associated with public registries.
Robust Security Policies: Organisations should implement strict policies around the use of open-source components. This includes performing security audits, using automated tools to scan for vulnerabilities, and enforcing review processes for any new dependencies being added to the codebase.

By integrating these practices into the development process, organisations can build more resilient software, reduce vulnerabilities, and prevent incidents involving malicious dependencies.

Conclusion

As the developer community continues to embrace rapid innovation, understanding the security risks inherent in third-party dependencies is crucial. Adopting preventive measures and enforcing better dependency management practices are vital to mitigate the risks of malicious packages compromising projects, data, and systems. By recognising these threats, developers and startups can secure their software supply chains and build more resilient products.

The Scaling Myth: Why More Isn’t Always Better

Rediscovering Derek J. de Solla Price: From Antikythera to Engineering Efficiency

The Myth of Highly Motivated Teams

The Problem

The Solution: Balance Over Ambition

Implications for Team Expansion

Shifting Focus: Outcomes Over Outputs

Outcome vs. Impact

Counterproductive Paradigm: The Threat Surface of Excessive Outputs

Focusing on Impacts and Outcomes: A Leadership Imperative

Conclusion: Redefining Productivity in Software Engineering

The “Ghost Engineer” Narrative

Unintended Amplification: The Snowball Effect

Engineering Productivity: Beyond Output Metrics

Commit Frequency Misses Critical Context

Analyzing the Stanford Study’s Claims

Claim 1: Engineers with Low Commit Activity Are Unproductive

Claim 2: Remote Engineers Are More Likely to Be “Ghost Engineers”

Claim 3: Low Commit Activity Correlates with Poor Team Performance

Claim 4: Organizations Could Save Billions by Addressing the “Ghost Engineer” Problem

Impact vs Code-Commits: Understanding the Misalignment

Example 1: A Cosmetic UI Change vs. A Critical API Update

Example 2: Bulk Refactoring vs. Precise Bug Fixing

Example 3: Feature Addition vs. Security Enhancement

Key Takeaways

The Danger of Flawed Productivity Metrics

A Holistic Approach to Productivity

Conclusion

References and Further Reading

The Vulnerability Trifecta in Third-Party Products

Beyond Web Servers: The Hidden Dependency Risks

Why Software Supply Chain Attacks Persist

Recommended Steps to Mitigate Supply Chain Threats

1. Map and Assess Dependencies

2. Implement Zero-Trust Architecture

3. Strengthen Vendor Management

4. Prioritize Secure Development and Deployment

5. Enhance Monitoring and Incident Response

6. Foster Collaboration

7. Schedule a No-Obligation Consultation Call with Yours Truly

Building a Resilient Supply Chain

Final Thoughts

References and Further Reading

Introduction: Dependency Dangers in the Developer Ecosystem

Faster Development, Shorter Diligence: A Security Conundrum

Malicious Package Techniques: A Deeper Dive

Timeline of Incidents: Malicious Packages in the Spotlight

The Impact: Scope of Compromise

Real-world Impact: Consequences of Malicious Packages

Fabrice: A Case Study in Typosquatting

Lessons Learned: Importance of Proactive Security Measures

Prevention Strategies: Staying Safe from Malicious Packages

Software Security: Embedding Security in the Development Lifecycle

Key Components of SSDLC

Deep Dive into Technical Details

Best Practices for Secure Dependency Management

Conclusion

References & Further Reading