The Truth About “Ghost Engineers”: A Critical Analysis

By Ramkumar Sundarakalatharan | December 7, 2024 | Comments 0 Comment

Disclaimer:
This article is not intended to discredit Boris Denisov, Stanford University, McKinsey, or any other entities referenced herein. I hold immense respect for their contributions to research and industry discourse. While findings like these may resonate with practices in FAANG companies, large organizations, and mature startups, this critique seeks to explore the broader implications of relying on narrow metrics to evaluate productivity in software engineering.

The “Ghost Engineer” Narrative

The term “ghost engineers,” popularized by a recent Stanford study, describes software engineers who allegedly contribute minimally to codebases. Analyzing data from over 50,000 engineers, the study concludes that 9.5% of engineers fall into this category, with the prevalence rising to 14% among remote workers.

While the findings spark interesting discussions, they rely heavily on the flawed assumption that code commit frequency equates to productivity. As I argued in No, McKinsey, You Got It All Wrong About Developer Productivity, this narrow perspective risks undervaluing critical aspects of software engineering that don’t leave a visible footprint in version control systems.

Unintended Amplification: The Snowball Effect

One of the most significant risks of such conclusions—especially before peer review—is their unintended amplification. Articles on Yahoo, TechCrunch, and Newsday have already simplified these findings, creating narratives that could ripple through the industry:

Unnecessary Layoffs: Misinterpreting data might lead organizations to hastily classify engineers as unproductive, ignoring less visible but valuable contributions.
Remote Work Stigma: By associating remote work with reduced productivity, these claims risk undermining one of the most effective workforce models when well-managed.
Toxic Metrics Culture: Over-reliance on activity metrics like commit counts can encourage engineers to game the system by prioritizing volume over meaningful work, as discussed in Business Value Delivery by Engineering Teams in Startups (Part 2).

History offers cautionary examples, such as McKinsey’s controversial reliance on lines of code as a productivity measure—a practice criticized in my earlier article for ignoring the multifaceted nature of modern software engineering.

Engineering Productivity: Beyond Output Metrics

As outlined in Is the Myth of a 10x Developer Real?, productivity in software engineering extends far beyond raw output. Effective engineers don’t just code—they align stakeholders, resolve ambiguity, and reduce future risks. These invisible contributions often lead to:

Improved Collaboration: Engineers who mentor, review code, or resolve cross-team dependencies amplify the impact of their teams.
Strategic Outcomes: Refactoring technical debt or implementing security frameworks might reduce visible code output while significantly improving system health.

Commit Frequency Misses Critical Context

Quality Over Quantity: A single commit that eliminates 1,000 lines of redundant code can be more impactful than 10 minor feature updates.
Diverse Roles: Roles like DevOps, QA, and security often contribute indirectly to engineering success but rarely generate frequent commits.

By focusing solely on visible metrics, we risk reinforcing flawed incentives, a point I emphasized in Business Value Delivery by Engineering Teams in Startups (Part 1).

Analyzing the Stanford Study’s Claims

Claim 1: Engineers with Low Commit Activity Are Unproductive

Rebuttal: This assumption ignores the cognitive and collaborative aspects of engineering. As noted in No, McKinsey, You Got It All Wrong About Developer Productivity, activities like design discussions, documentation, and mentoring are essential but invisible in commit logs.

Claim 2: Remote Engineers Are More Likely to Be “Ghost Engineers”

Rebuttal: Remote work relies on asynchronous collaboration, where documentation and long-term planning take precedence over immediate outputs. Simplistic comparisons risk stigmatizing effective remote models.

Claim 3: Low Commit Activity Correlates with Poor Team Performance

Rebuttal: High-performing teams often include specialists whose contributions are less visible but critical. For example, a security engineer resolving vulnerabilities or a DevOps engineer optimizing CI/CD pipelines may not show up in commit logs.

Claim 4: Organizations Could Save Billions by Addressing the “Ghost Engineer” Problem

Rebuttal: Cost-cutting measures based on flawed metrics often lead to higher technical debt, increased turnover, and diminished morale. As argued in Business Value Delivery by Engineering Teams in Startups (Part 2), true cost efficiency lies in maximizing impact, not minimizing headcount.

Impact vs Code-Commits: Understanding the Misalignment

A recurring issue with productivity metrics like code-commit frequency is their inability to reflect the true impact of an engineer’s work. The volume of code changes often says little about the value delivered, as demonstrated by the following examples:

Example 1: A Cosmetic UI Change vs. A Critical API Update

Imagine a product manager requests a seemingly simple change: update a button’s color from purple to orange. While this may sound trivial, it could involve:

Updating CSS libraries: A cascade of dependencies might require 1,000+ lines of revisions.
Testing for accessibility: Ensuring compliance with color-contrast guidelines adds complexity.
Regression testing: Updating snapshot tests or fixing broken visual diffs.

This cosmetic change could result in dozens of commits, each addressing a specific dependency or edge case.

Contrast this with a backend engineer’s work on the API gateway to improve application concurrency. This might involve:

Identifying bottlenecks: Profiling existing workloads and implementing a solution to reduce latency.
Optimizing database connections: Reducing round trips or improving query performance.
Deploying with minimal disruption: A single, concise commit could encapsulate weeks of planning and testing.

Here, the backend change’s impact far outweighs the UI update, even though it appears smaller in terms of commit frequency.

Example 2: Bulk Refactoring vs. Precise Bug Fixing

A mid-level engineer is tasked with refactoring a legacy module, updating deprecated methods, and restructuring a monolithic codebase for better readability. This effort generates hundreds of commits and thousands of lines of changes, none of which immediately improve the product’s features.

On the other hand, a senior engineer identifies and fixes a critical bug that intermittently crashes the application. The solution, a one-line code change after hours of debugging, resolves a high-severity issue affecting thousands of users.

From a commit-count perspective, the refactoring task appears more productive. However, the senior engineer’s single-line fix has a far greater immediate impact.

Example 3: Feature Addition vs. Security Enhancement

A frontend developer introduces a new feature, such as a user profile editor. This entails:

New UI components: HTML and CSS for the form.
Frontend validations: JavaScript-based constraints for data inputs.
Integration tests: Mock API responses for various test cases.

The addition spans 2,000 lines of code across 20 commits.

Meanwhile, a DevSecOps engineer works on a critical security vulnerability. The task involves:

Rotating access tokens: Updating key secrets stored in the CI/CD pipeline.
Implementing security headers: Adding CSPs to prevent XSS attacks.
Hardening configurations: Minor changes in deployment scripts to reduce attack surfaces.

Although the security enhancement generates fewer than 10 commits, its value in preventing potential breaches and compliance penalties is enormous.

Key Takeaways

Context Matters: Evaluating productivity requires understanding the context and complexity of the task, not just the output volume.
Quality Over Quantity: High-impact changes often involve fewer commits, while low-value tasks may inflate commit counts.
Recognizing Diverse Contributions: Engineers working on performance, security, or architecture frequently produce less visible yet highly impactful work.

This misalignment underscores the need for organizations to adopt holistic evaluation metrics that consider both quantitative output and qualitative impact. By focusing on the latter, teams can better recognize and reward meaningful contributions.

The Danger of Flawed Productivity Metrics

Simplistic metrics can have cascading negative effects:

Burnout: Engineers may feel pressured to prioritize activity over quality.
Stifled Innovation: Overemphasis on visible output discourages experimentation and risk-taking.
Loss of Talent: Talented engineers in specialized roles may leave if their contributions are undervalued.

As emphasized in Is the Myth of a 10x Developer Real?, effective engineering is about multiplying impact, not maximizing visible output.

A Holistic Approach to Productivity

To address these issues, organizations must adopt nuanced evaluation frameworks:

Impact-Driven Metrics: Evaluate contributions based on outcomes, such as improved system reliability or customer satisfaction.
Recognize Invisible Work: Acknowledge tasks like mentorship, technical debt reduction, and long-term strategic planning.
Foster a Culture of Trust: Empower teams to experiment and innovate without fear of being misjudged by flawed metrics.

Conclusion

The “ghost engineer” narrative oversimplifies the multifaceted nature of software engineering. By relying on metrics like commit counts, it risks undervaluing critical contributions and fostering unhealthy workplace dynamics. As I’ve argued across multiple articles, effective engineering teams succeed by delivering value, not just output. The industry must move beyond flawed productivity metrics and adopt more comprehensive frameworks to recognize the true contributions of every engineer.

References and Further Reading

Denisov-Blanch, Y. (2024). Twitter Thread on Ghost Engineers. Retrieved from link.
Denisov-Blanch, Y. (2024). Stanford Research on Software Engineering Productivity. Stanford University. Retrieved from link.
Polyakov, A. (2024). Ghost Engineers—Utter Non-Sense! Medium. Retrieved from link.
No, McKinsey, You Got It All Wrong About Developer Productivity. Nocturnalknight.co. Retrieved from link.
Is the Myth of a 10x Developer Real? Nocturnalknight.co. Retrieved from link.
Bridgwater, A. (2024). Code Busters: Are Ghost Engineers Haunting DevOps Productivity? DevOps.com. Retrieved from link.
Business Value Delivery by Engineering Teams in Startups (Part 1). Nocturnalknight.co. Retrieved from link.
Business Value Delivery by Engineering Teams in Startups (Part 2). Nocturnalknight.co. Retrieved from link.
Long, K. (2024). Are Ghost Engineers Undermining Tech Productivity? Business Insider. Retrieved from link.
Passionate Geekz. (2024). Can a Company Increase Its Market Value by Laying Off Employees? Retrieved from link.

Do You Know What’s in Your Supply Chain? The Case for Better Security

By Ramkumar Sundarakalatharan | December 2, 2024 | Comments 0 Comment

I recently read an interesting report by CyCognito on the top 3 vulnerabilities on third-party products and it sparked my interest to reexamine the supply chain risks in software engineering. This article is an attempt at that.

The Vulnerability Trifecta in Third-Party Products

The CyCognito report identifies three critical areas where third-party products introduce significant vulnerabilities:

Web Servers
These foundational systems host countless applications but are frequently exploited due to misconfigurations or outdated software. According to the report, 34% of severe security issues are tied to web server environments like Apache, NGINX, and Microsoft IIS. Vulnerabilities like directory traversal or improper access control can serve as gateways for attackers.
Cryptographic Protocols
Secure communication relies on cryptographic protocols like TLS and HTTPS. Yet, 15% of severe vulnerabilities target these mechanisms. For instance, misconfigurations, weak ciphers, or reliance on deprecated standards expose sensitive data, with inadequate encryption ranking second on OWASP’s Top 10 security threats.
Web Interfaces Handling PII
Applications that process PII—such as invoices or financial statements—are among the most sensitive assets. Alarmingly, only half of such interfaces are protected by Web Application Firewalls (WAFs), leaving them vulnerable to injection attacks, session hijacking, or data leakage.

Beyond Web Servers: The Hidden Dependency Risks

You control your software stack, but do you actually know what runs beneath those flashy Web/Application servers?

Drawing parallels from my previous article on PyPI and NPM vulnerabilities, it’s clear that open-source dependencies amplify these threats. Attackers exploit the very trust inherent in supply chains, introducing malicious packages or exploiting insecure libraries.

For example:

Attackers have embedded malware into popular NPM and PyPI packages, which are then unknowingly incorporated into enterprise-grade software.
Dependency confusion attacks exploit naming conventions to inject malicious packages into CI/CD pipelines.

These risks share a core vulnerability with traditional third-party systems: an opaque supply chain with minimal oversight. This is compounded by the ever-decreasing cycle-times for each software releases, giving little to no time for even great Software Engineering teams to doa decent audit and look into the dependency graph of the packages they are building their new, shiny/pointy things that is to transform the world.

Why Software Supply Chain Attacks Persist

As highlighted by Scientific Computing World, software supply chain attacks persist for several reasons:

Aggressive GTM Timelines: Most organisations now run quarterly or even monthly product roadmaps, so it is possible to launch a new SaaS product in a matter of days to weeks by leveraging other IaaS, PaaS or SaaS systems – in addition to any Libraries, frameworks and other constructs.
Exponential Complexity: With organisations relying on layers of third-party and fourth-party services, the attack surface expands exponentially.
Insufficient Oversight: Organisations often focus on securing their environments while neglecting the vendors and libraries they depend on.
Lagging Standards: The industry’s inability to enforce stringent security protocols across the supply chain leaves critical gaps.
Sophistication of Attacks: From SolarWinds to MOVEit, attackers continually evolve, targeting blind spots in detection and remediation frameworks.

Recommended Steps to Mitigate Supply Chain Threats

To address these vulnerabilities and build resilience, organizations can take the following actionable steps:

1. Map and Assess Dependencies

Use tools like Dependency-Track or Sonatype Nexus to map and analyze all third-party and open-source dependencies.
Regularly perform software composition analysis (SCA) to detect outdated or vulnerable components.

2. Implement Zero-Trust Architecture

Leverage Zero-Trust frameworks like NIST 800-207 to ensure strict authentication and access controls across all systems.
Minimize the privileges of third-party integrations and isolate sensitive data wherever possible.

3. Strengthen Vendor Management

Evaluate vendor security practices using frameworks like the NCSC’s Supply Chain Security Principles or the Open Trusted Technology Provider Standard (OTTPS).
Demand transparency through detailed Service Level Agreements (SLAs) and regular vendor audits.

4. Prioritize Secure Development and Deployment

Train your development teams to follow secure coding practices like those outlined in the OWASP Secure Coding Guidelines.
Incorporate tools like Snyk or Checkmarx to identify vulnerabilities during the software development lifecycle.

5. Enhance Monitoring and Incident Response

Deploy Web Application Firewalls (WAFs) such as AWS WAF or Cloudflare to protect web interfaces.
Establish a robust incident response plan using guidance from the MITRE ATT&CK Framework to ensure rapid containment and mitigation.

6. Foster Collaboration

Work with industry peers and organizations like the Cybersecurity and Infrastructure Security Agency (CISA) to share intelligence and best practices for supply chain security.
Collaborate with academic institutions and research groups for cutting-edge insights into emerging threats.

7. Schedule a No-Obligation Consultation Call with Yours Truly

Struggling with supply chain vulnerabilities or need tailored solutions for your unique challenges? I offer consultation services to work directly with your CTO, Principal Architect, or Security Leadership team to:

Assess your systems and identify key risks.
Recommend actionable, budget-friendly steps for mitigation and prevention.

With years of expertise in cybersecurity and compliance, I can help streamline your approach to supply chain security without breaking the bank. Let’s collaborate to make your operations secure and resilient.

Schedule Your Free Consultation Today

Building a Resilient Supply Chain

The UK’s National Cyber Security Centre (NCSC) principles for supply chain security provide a pragmatic roadmap for businesses. Here’s how to act:

Understand and Map Dependencies
Organizations should create a detailed map of all dependencies, including direct vendors and downstream providers, to identify potential weak links.
Adopt a Zero-Trust Framework
Treat every external connection as untrusted until verified, with continuous monitoring and access restrictions.
Mandate Secure Development Practices
Encourage or require vendors to implement secure coding standards, frequent vulnerability testing, and robust update mechanisms.
Regularly Audit Supply Chains
Establish a routine audit process to assess vendor security posture and adherence to compliance requirements.
Proactive Incident Response Planning
Prepare for the inevitable by maintaining a robust incident response plan that incorporates supply chain risks.

Final Thoughts

The threat of supply chain vulnerabilities is no longer hypothetical—it’s happening now. With reports like CyCognito’s, research into dependency management, and frameworks provided by trusted institutions, businesses have the tools to mitigate risks. However, this requires vigilance, collaboration, and a willingness to rethink traditional approaches to third-party management.

Organisations must act not only to safeguard their operations but also to preserve trust in an increasingly interconnected world.

Is your supply chain ready to withstand the next wave of attacks?

References and Further Reading

What’s your strategy for managing third-party risks? Share your thoughts in the comments!

How Will China’s Quantum Advances Change Internet Security?

By Ramkumar Sundarakalatharan | October 13, 2024 | Comments 0 Comment

Image Generated with Dalle 3

Introduction:

Chinese scientists have recently announced that they have successfully cracked military-grade encryption using a quantum computer with 372 qubits, a significant achievement that underscores the rapid evolution of quantum technology. This breakthrough has sparked concerns across global cybersecurity communities as RSA-2048 encryption—a widely regarded standard—was reportedly compromised. However, while this development signifies an important leap forward in quantum capabilities, its immediate implications are nuanced, particularly for everyday encryption protocols.

Drawing on technical insights from recent papers and analyses, this article delves deeper into the technological aspects of the breakthrough and explores why, despite this milestone, quantum computing still has limitations that prevent it from immediately threatening personal and business-level encryption.

The Quantum Breakthrough: Factoring RSA-2048

As reported by The Quantum Insider and South China Morning Post, the Chinese research team employed a 372-qubit quantum computer to crack RSA-2048 encryption, a cryptographic standard widely used to protect sensitive military information. RSA encryption relies on the difficulty of factoring large numbers, a task that classical computers would take thousands of years to solve. However, using quantum algorithms—specifically an enhanced version of Shor’s algorithm—the team demonstrated that quantum computers could break RSA-2048 in a much shorter time frame.

The breakthrough optimised Shor’s algorithm to function efficiently within the constraints of a 372-qubit machine. This marks a critical turning point in quantum computing, as it demonstrates the potential for quantum systems to tackle problems previously considered infeasible for classical systems. However, the paper from the Chinese Journal of Computers (2024) offers deeper insights into the quantum architecture and algorithmic refinements that made this breakthrough possible, highlighting both the computational power and limitations of the system.

Quantum Hardware and Algorithmic Optimisation

The technical aspects of the Chinese breakthrough, as detailed in the 2024 paper published in the Chinese Journal of Computers (CJC), emphasise the improvements in quantum hardware and algorithmic approaches that were key to this success. The paper outlines how the researchers enhanced Shor’s algorithm to mitigate the high error rates commonly associated with quantum computing, allowing for more stable computations over longer periods. This required optimising quantum gate operations, reducing quantum noise, and employing error-correction codes to preserve the integrity of qubit states.

Despite these improvements, the paper makes it clear that current quantum computers, including the 372-qubit machine used in this experiment, still suffer from several limitations. The system required an extremely controlled environment to maintain qubit coherence, and any deviation from ideal conditions would have introduced significant errors. Furthermore, the researchers faced challenges related to the scalability of the system, as error rates increase exponentially with the number of qubits involved. These limitations are consistent with the broader consensus in the field, as noted by Bill Buchanan and other experts, that practical quantum decryption on a global scale is not yet feasible.

The CJC paper also points out that while the breakthrough is impressive, it does not represent a complete realisation of quantum supremacy—the point at which quantum computers outperform classical computers across a wide range of tasks. The paper discusses the need for further advancements in quantum gate fidelity, qubit interconnectivity, and error correction to make quantum decryption scalable and applicable to broader, real-world encryption protocols.

Technical Analysis based on Li et al. (2024):

The paper explores two approaches for attacking RSA public key cryptography using quantum annealing:

1. Quantum Annealing for Combinatorial Optimization:

Method: This approach translates the mathematical attack method into a combinatorial optimization problem suited for the Ising model or QUBO model [1]. The Ising model represents a system of interacting spins, which can be mapped to the problem of factoring large integers used in RSA encryption.
Key Contribution: The paper proposes a high-level optimization model for multiplication tables and establishes a new dimensionality reduction formula. This formula reduces the number of qubits needed, thus saving resources and improving the stability of the Ising model [1]. The authors demonstrate this by successfully decomposing a two-million-level integer using a D-Wave Advantage system.
Comparison: This approach outperforms previous methods by universities and corporations like Purdue, Lockheed Martin, and Fujitsu [1]. This is achieved by significantly reducing the range of coefficients required in the Ising model, leading to a higher success rate in decomposition.
Focus: This technique represents a class of attack algorithms specifically designed for D-Wave quantum computers, known for their use of quantum annealing [1].

2. Quantum Annealing with Classical Methods:

Method: This approach combines the quantum annealing algorithm with established mathematical methods for cryptographic attacks, aiming to optimize attacks on specific cryptographic components [1]. It integrates the classical lattice reduction algorithm with the Schnorr algorithm.
Key Contribution: The authors leverage the quantum tunneling effect to adjust the rounding direction within the Babai algorithm, allowing for precise vector determination, a crucial step in the attack [1]. Quantum computing’s exponential acceleration capabilities address the challenge of calculating numerous rounded directions, essential for solving lattice problems [1]. Additionally, the paper proposes methods to improve search efficiency for close vectors, considering both qubit resources and time costs [1]. Notably, it demonstrates the first 50-bit integer decomposition on a D-Wave Advantage system, showcasing the algorithm’s versatility [1].
Comparison: The paper argues that D-Wave quantum annealing offers a more practical approach for smaller-scale attacks compared to Variational Quantum Algorithms (VQAs) on NISQ (Noisy Intermediate-Scale Quantum) computers. VQAs suffer from the “barren plateaus” problem, which can hinder algorithm convergence and limit effectiveness [1]. Quantum annealing is less susceptible to this limitation and offers an advantage when dealing with smaller-scale attacks.

Citations:

Li, Gao, et al. “A Novel Quantum Annealing Attack on RSA Public Key Cryptosystems.” WC 2024 (2024).

Implications for Civilian Encryption: Limited Immediate Impact

While the Chinese breakthrough is undeniably significant, it is essential to recognise that the decryption of military-grade encryption does not immediately translate to vulnerabilities in civilian encryption protocols. Most personal and business communications rely on RSA-1024, elliptic-curve cryptography (ECC), or other lower-bit encryption systems. These systems remain secure against the capabilities of today’s quantum computers.

Moreover, as highlighted in the paper by Buchanan and echoed in the CJC analysis, many organisations are already transitioning towards post-quantum cryptography (PQC). PQC algorithms are specifically designed to withstand quantum attacks, ensuring that even as quantum computers advance, encryption systems will evolve to meet new threats.

Another key point raised by the CJC paper is that quantum decryption requires an immense amount of resources and computational power. The system used to break RSA-2048 involved highly specialised hardware and extensive computational time. Scaling such an operation to break everyday encryption protocols, such as those used in internet banking or personal communications, would require quantum computers with far more qubits and error-correction capabilities than are currently available.

Preparing for a Quantum Future: Post-Quantum Cryptography

As quantum computing technology evolves, it is imperative that governments, companies, and cybersecurity professionals continue preparing for the eventual reality of quantum decryption. This preparation includes developing and implementing post-quantum cryptographic solutions that are immune to quantum attacks. The National Institute of Standards and Technology (NIST) has already initiated efforts to standardise post-quantum cryptographic algorithms, which are designed to be secure against both classical and quantum attacks. The CJC paper underlines the importance of this transition and suggests that PQC will likely become the new standard in encryption over the next decade.

In addition to PQC, the CJC paper highlights the need for ongoing research into hybrid encryption systems, which combine classical cryptographic techniques with quantum-resistant methods. These hybrid systems could provide a transitional solution, allowing existing infrastructure to remain secure while fully quantum-resistant algorithms are developed and implemented.

Conclusion: A Scientific Milestone with Limited Immediate Consequences

The Chinese research team’s quantum decryption of military-grade encryption is a groundbreaking scientific achievement, signalling that quantum computing is rapidly advancing towards practical applications. However, as emphasised in the technical analyses from the Chinese Journal of Computers and other sources, this breakthrough is not yet a direct threat to civilian encryption systems. Current quantum computers remain limited by their error rates, scalability challenges, and the need for controlled environments, preventing widespread decryption capabilities.

As organisations and governments prepare for a post-quantum future, the adoption of post-quantum cryptography and hybrid systems will be crucial in ensuring that encryption protocols remain robust against both classical and quantum threats. While the breakthrough highlights the potential power of quantum computing, its impact on everyday encryption is still years, if not decades, away.

References and Further Reading

Bill Buchanan, “A Major Advancement on Quantum Cracking,” Medium, 2024.
The Quantum Insider, “Chinese Scientists Report Using Quantum Computer to Hack Military-Grade Encryption,” October 11, 2024.
South China Morning Post, “Chinese Scientists Hack Military-Grade Encryption Using Quantum Computer,” October 2024.
Interesting Engineering, “China’s Scientists Successfully Hack Military-Grade Encryption with Quantum Computer,” October 2024.
Shor, P.W., “Algorithms for Quantum Computation: Discrete Logarithms and Factoring,” Proceedings of the 35th Annual Symposium on Foundations of Computer Science, 1994.
National Institute of Standards and Technology (NIST), “Post-Quantum Cryptography: Current Status,” 2024.
Chinese Journal of Computers, “Quantum Algorithmic Enhancements in Breaking RSA-2048 Encryption,” 2024.

Is the AI Boom Overhyped? A Look at Potential Challenges

By Ramkumar Sundarakalatharan | June 29, 2024 | Comments 0 Comment

Introduction:

The rapid development of Artificial Intelligence (AI) has fueled excitement and hyper-investment. However, concerns are emerging about inflated expectations, not just the business outcomes, but also from the revenue side of the things.. This article explores potential challenges that could hinder widespread AI adoption and slow down the current boom.

The AI Hype:

AI has made significant strides, but some experts believe we might be overestimating its near-future capabilities. The recent surge in AI stock prices, particularly Nvidia’s, reflects this optimism. Today, it’s the third-most-valuable company globally, with an 80% share in AI chips—processors central to the largest and fastest value creation in history, amounting to $8 trillion. Since OpenAI released ChatGPT in October 2022, Nvidia’s value has surged by $2 trillion, equivalent to Amazon’s total worth. This week, Nvidia reported stellar quarterly earnings, with its core business—selling chips to data centres—up 427% year-over-year.

Bubble Talk:

History teaches us that bubbles form when unrealistic expectations drive prices far beyond a company or a sector’s true value. The “greater fool theory” explains how people buy assets hoping to sell them at a higher price to someone else, even if the asset itself has no inherent value. This mentality often fuels bubbles, which can burst spectacularly. I am sure you’ve read about the Dutch Tulip Mania, if not please help yourself to an amusing read here and here.

AI Bubble or Real Deal?:

The AI market holds undeniable promise, but is it currently overvalued? Let’s look at past bubbles for comparison:

Dot-com Bubble: The Internet revolution was real, but many companies were wildly overvalued. While some thrived, others crashed. – Crazy story about the dotcom bubble
Housing Bubble: Underlying factors like limited land contributed to the housing bubble, but speculation inflated prices beyond sustainability.
Cryptocurrency Bubble: While blockchain technology has potential, some cryptocurrencies like Bored Apes were likely fueled by hype rather than utility.

The AI Bubble’s Fragility:

The current AI boom shares similarities with past bubbles:

Rapid Price Increases: AI stock prices have skyrocketed, disconnected from current revenue levels.
Speculative Frenzy: The “fear of missing out” (FOMO) mentality drives new investors into the market, further inflating prices.
External Factors: Low interest rates can provide cheap capital that fuels bubbles.

Nvidia’s rich valuation is ludicrous — its market cap now exceeds that of the entire FTSE 100, yet its sales are less than four per cent of that index

The Coming Downdraft?

While AI’s long-term potential is undeniable, a correction is likely. Here’s one possible scenario:

A major non-tech company announces setbacks with its AI initiatives. This could trigger a domino effect, leading other companies to re-evaluate their AI investments.
Analyst downgrades and negative press coverage could further dampen investor confidence.
A “stampede for the exits” could ensue, causing a rapid decline in AI stock prices.

Learning from History:

The dot-com bubble burst when economic concerns spooked investors. The housing bubble collapsed when it became clear prices were unsustainable. We can’t predict the exact trigger for an AI correction, but history suggests it’s coming.

The Impact of a Burst Bubble:

The collapse of a major bubble can have far-reaching consequences. The 2008 financial crisis, triggered by the housing bubble, offers a stark reminder of the potential damage.

Beyond the Bubble:

Even if a bubble bursts, AI’s long-term potential remains. Here’s a thought-provoking comparison:

Cisco vs. Amazon: During the dot-com bubble, Cisco, a “safe” hardware company, was seen as a better investment than Amazon, a risky e-commerce startup. However, Amazon ultimately delivered far greater returns.

Conclusion:

While the AI boom is exciting, it’s crucial to be aware of potential bubble risks. Investors should consider a diversified portfolio and avoid chasing short-term gains. Also please be wary of the aftershocks. Even if the market corrects by 20% or even 30% the impact won’t be restricted to AI portfolios. There would be a funding winter of sorts, hire freezes and all the broader ecosystem impacts.

The true value of AI will likely be revealed after the hype subsides.

References and Further Reading

Precedence Research – The Growing AI Chip Market
Bloomberg – AI Boom and Market Speculation
PRN – The AI Investment Surge
The Economist – AI Revenue Projections
Russel Investments – Understanding Market Bubbles
CFI – Dutch Tulip Market Bubble

Inside the Palantir Mafia: Secrets to Succeeding in the Tech Industry

By Ramkumar Sundarakalatharan | June 7, 2024 | Comments 0 Comment

In the world of technology, engineers are not just cogs in a machine; they are the builders, the dreamers, and the ones who solve the problems they see in the world. And sometimes, those solutions turn into billion-dollar businesses. This is the story of the “Palantir Mafia,” a group of former Palantir employees who have left the data analytics giant to found their own startups, just like the famed “PayPal Mafia” that produced companies like SpaceX, YouTube, LinkedIn, Palantir Technologies, Affirm, Slide, Kiva, and Yelp.

1. Introducing the Amazing People from Palantir

The “Palantir Mafia,” akin to the renowned “PayPal Mafia,” comprises former Palantir engineers and executives who left to tackle meaningful problems with technological innovation, creating substantial impact and wealth. Unlike ex-consultants from firms like McKinsey, BCG, or Bain, these tech leaders leverage their deep technical expertise to solve complex issues directly, resulting in profound advancements and successful ventures.

Key Figures and Their Ventures

Alex Karp – Palantir Technologies
- Former Role: Co-Founder and CEO
- Company: Palantir Technologies
- Focus: Data analytics
- Market Penetration: Widely used across government and commercial sectors
- Revenue: $1.5 billion annually
- Capital Raised: $3 billion (Wikipedia) (Business Insider)
Max Levchin – Affirm
- Former Role: Co-Founder (PayPal, associated with Palantir founders)
- Company: Affirm
- Focus: Buy now, pay later financial services
- Market Penetration: Significant presence in the consumer finance market
- Revenue: $870 million in fiscal 2021
- Capital Raised: $1.5 billion
Joe Lonsdale – 8VC
- Former Role: Co-Founder
- Company: 8VC
- Focus: Venture capital firm
- Market Penetration: Diverse portfolio, influential in tech sectors
- Assets Under Management: $3.6 billion
Palmer Luckey – Anduril Industries ( could be the blue blooded Musk of 2020-2030s)
- Former Role: Founder of Oculus VR, associated with Palantir through ventures
- Company: Anduril Industries
- Focus: Defense technology
- Innovation: Developed the Lattice AI platform for autonomous border surveillance and defense applications
- Market Penetration: Contracts with U.S. Department of Defense and border security agencies
- Revenue: $200 million annually
- Capital Raised: $700 million
Garrett Smallwood – Wag!
- Former Role: Executive roles at other startups before Wag!
- Company: Wag!
- Focus: On-demand pet care services
- Market Penetration: Operates in over 100 cities
- Revenue: $100 million annually
- Capital Raised: $361.5 million
Nima Ghamsari – Blend
- Former Role: Product Manager at Palantir
- Company: Blend
- Focus: Mortgage and lending software
- Market Penetration: Partners with major financial institutions
- Revenue: Estimated $100 million+ annually
- Capital Raised: $665 million
Stephen Cohen – Quantifind
- Former Role: Co-Founder of Palantir
- Company: Quantifind
- Focus: Risk and fraud detection using data science
- Market Penetration: Used by financial services and government sectors
- Capital Raised: $8.7 million
Vibhu Norby – B8ta
- Former Role: Engineer at Palantir
- Company: B8ta
- Focus: Retail-as-a-service platform
- Market Penetration: Transforming in-store retail experiences
- Capital Raised: $113 million
Joe Lonsdale – Addepar
- Former Role: Co-Founder of Palantir
- Company: Addepar
- Focus: Wealth management technology
- Market Penetration: Manages over $2 trillion in assets
- Capital Raised: $325 million
Raman Narayanan – SigOpt
- Former Role: Data Scientist at Palantir
- Company: SigOpt (acquired by Intel)
- Focus: Machine learning optimization
- Market Penetration: Utilized by top tech companies
- Capital Raised: $8.7 million (before acquisition)

2. Engineers Make Better Founders in the Tech Industry

Unlike ex-consultants from big 3 who may excel in strategy and communication but often lack the technical depth to truly understand the intricacies of building a tech product, these ex-Palantir engineers come armed with both the vision and the technical chops to bring their ideas to life. They’ve spent years wrestling with complex data problems at Palantir, and they’re now taking those hard-won lessons to solve new challenges across a wide range of industries.

Engineers bring a problem-solving mindset that focuses on creating practical, scalable solutions. This technical acumen has allowed former Palantir employees to launch transformative companies that push the boundaries of what’s possible in various industries.

3. Market Penetration and Success of Palantir Alumni

The success of these Palantir alumni is evident through their market penetration and revenue. For instance, Palantir Technologies itself is a major player in the data analytics field, with a revenue of $1.5 billion annually. Affirm, led by Max Levchin, has made significant inroads in the consumer finance market, generating $870 million in revenue in fiscal 2021. Anduril Industries, founded by Palmer Luckey, has secured substantial contracts with the U.S. Department of Defense, contributing to its $200 million annual revenue.

Other successful ventures include Blend, with its deep partnerships with major financial institutions, and Addepar, managing over $2 trillion in assets. These companies not only showcase the technical expertise of their founders but also highlight their ability to penetrate markets and achieve substantial financial success.

4. Engineers vs. Consultants: A Compelling Argument

The technical depth and problem-solving mindset of engineers make them particularly suited for founding and leading tech startups. Their ability to directly tackle complex problems contrasts with the approach of ex-consultants from firms like McKinsey, BCG, or Bain, who often focus more on financial and operational efficiencies.

While consultants excel in operations-heavy startups, where strategic planning, financial management, and operational efficiency are paramount, engineers thrive in tech startups that require innovative solutions and deep technical expertise. The success stories of the Palantir alumni underscore this distinction, demonstrating how their engineering backgrounds have enabled them to drive significant technological advancements and build successful companies.

Conclusion

The Palantir Mafia’s engineers have leveraged their technical expertise to create innovative solutions and successful ventures, driving significant impact across various industries. Their ability to tackle complex problems directly contrasts with the approach of ex-consultants from firms like McKinsey, BCG, or Bain, who often focus more on financial and operational efficiencies. This technical depth has enabled these former Palantir employees to become influential leaders, pushing the boundaries of technology and innovation.

References & Further Reading:

From Soaring High to Stalling Out: How Boeing Lost Its Engineering Edge

By Ramkumar Sundarakalatharan | March 23, 2024 | Comments 0 Comment

The world’s largest aerospace conglomerate turns 108 this year. Boeing’s 1st plane, a Boeing Model 1 officially took off on 15 July 1916 when Wong Tsu (A Chinese graduate from MIT) completed the construction at the Heath Shipyard. As of 2023 September, a total of 78000 aircraft have rolled out of Boeing factories (excluding license-produced models elsewhere) with a total of 500+ unique aircraft designed across civilian, military, concept, prototypes and experiential designs. Boeing, really used to be a powerhouse of aviation technologies.

Boeing, once synonymous with aviation innovation, has hit turbulence in recent years. The company’s gradual decline can be traced to a shift in focus, prioritising short-term profits over the long-term commitment to hardcore engineering excellence that built its reputation.

P&L trend of Boeing, Infographics source Statista

A Legacy of Innovation Tarnished

Boeing’s history is a testament to American ingenuity. From the iconic 747 “Jumbo Jet” revolutionising passenger travel (over 1,500 delivered) to the technologically advanced 787 Dreamliner boasting superior fuel efficiency (over 1,700 delivered) [1], the company consistently pushed the boundaries of aerospace engineering. However, a gradual cultural shift began prioritising financial goals over engineering rigour. A Harvard Business Review article [2] highlights the pressure placed on engineers to meet aggressive deadlines and cost-cutting measures, potentially contributing to the tragic crashes of the 737 MAX aircraft. IMHO, The Boeing engineering disaster had roots in Welch’s deeply flawed management doctrines which were spread across American industry by his acolytes.

Lost Market Share and a Bleak Future

This shift in priorities has had significant financial consequences. The 737 MAX grounding, coupled with production delays of the 787 Dreamliner, significantly eroded Boeing’s market share. In the single-aisle passenger jet market, the crown jewel of commercial aviation, Airbus, Boeing’s main competitor, now holds a commanding lead of over 60% [3]. While Boeing struggles with a backlog of unfulfilled orders (around 4,000), Airbus boasts a healthier backlog exceeding 7,000 aircraft [4]. This translates to a stark difference in profitability. In 2023, Airbus reported a net profit of €4.2 billion ($4.5 billion) compared to Boeing’s net loss of $3.7 billion.

Examples of Lost Focus:

737 MAX: The faulty design and subsequent crashes of the 737 MAX (over 100 undelivered orders due to grounding) exposed a culture that prioritised speed to market over thorough engineering review.
787 Dreamliner: Production problems with the Dreamliner, including issues with electrical wiring and fuselage construction (hundreds of delayed deliveries), further eroded trust in Boeing’s manufacturing capabilities.
X-32 JSF: The loss of the JSF contract to Lockheed Martin in 2001 was a major blow to Boeing, as it represented the most important international fighter aircraft project since the Lightweight Fighter program competition of the 1960s

Can Boeing Recover?

The road to recovery for Boeing will be long and arduous. Rebuilding trust with airlines and passengers will require a renewed commitment to safety and engineering excellence. This may involve significant changes in leadership and corporate culture, prioritizing long-term sustainability over short-term gains.

Boeing’s story serves as a cautionary tale for any company. While financial goals are important, sacrificing core values and engineering expertise can lead to devastating consequences. The future of this aviation giant remains uncertain, but one thing is clear: regaining its former glory will require a return to the principles that made it great in the first place.

References and Further Reading:

[1] Boeing: Our Story https://www.boeing.com/history
[2] Harvard Business Review: What can Corporate Boards learn from Boeing’s mistakes? – https://hbr.org/2021/06/what-corporate-boards-can-learn-from-boeings-mistakes
[3] Leeham News and Analysis: Airbus Market Share Trends – https://leehamnews.com/2024/02/15/airbus-hails-landmark-year-with-strong-2023-results-amid-delivery-ramp-up/
[4] Air Current: Airbus vs Boeing Order Book – https://flightplan.forecastinternational.com/2024/01/15/airbus-and-boeing-report-december-and-full-year-2023-commercial-aircraft-orders-and-deliveries/

Why Startups Need To Architect Cloud Agnostic Products

By Ramkumar Sundarakalatharan | November 26, 2023 | Comments 0 Comment

Nobody plans to leave AWS in the startup world, but as they say, “sh** happens.”

As engineers, when we write software, we’re taught to keep it elegant by never depending directly on external systems. We write wrappers for external resources, we encapsulate data and behaviour and standardise functions with libraries.

But, When it comes to the cloud… “eerie silence”

Companies have died because they needed to move off AWS or GCP but couldn’t do it in a reasonable and cost-effective timeline.

We (at Itilite) had a close call with GCP, which served as our brush with the fire. Google had arguably one of the best Distance Matrix capabilities out there. It was used in one of our core logic and ML models. And on one fine Monday afternoon, I have to set up a meeting with my CEO to communicate that we will have to spend ~250% more on our cloud service bill in about 60 days.

Actually, google increased the pricing by 1400% and gave 60 days to rewrite, migrate, move out or perish!

The closest competitor in terms of capability was DistanceMatrix and a reliable “Large” player was Bing. But, both left a lot for in the “Accuracy”. So, for us, the business decision was simple: make the entire product work in “Reduced Functionality” mode for all or start differential pricing for better accuracy! In either case, those APIs must be rewritten with a new adaptor.

It is not an enigma why we do this. It’s simple: there are no alternatives, there is no time to GTM, But maybe there is. I’ll explain why you should take cloud-agnostic architecture seriously and then show you what I do to keep my projects cloud-agnostic.

Cloud Service Rationalisation

The prime reason you should consider the ability to switch clouds and cloud services is so you can choose to use the cloud service that is price and performance-optimized for your use case.

When I first got into serverless, we wrote a transformative API on Oracle Cloud (Bcoz we were part of their Accelerator Program and had a huge credit.) but it fed part of the data that the customer-facing API relied on.

No prize for guessing what happened?

It was a horrible mistake. Our API had an insane latency problem. Cold start requests added additional latency of at least 2 seconds per request. The AWS team has worked hard to build a service that can do things that GCP’s Cloud Functions simply can’t, specifically around cold starts and latency.

I had to move my infrastructure to a different service and a revised network topology.

Guess we would have learned the problem by now, but as we will find out, we did not.

This time it was a combination of Kafka and the AWS Lambda that created an issue. We had relied on Confluent’s connectors for much of the workload interfaces and had to shell out almost $1000 per month per connector!

Avoiding the Cloud Provider Killswitch

Protect Your Business from Unexpected Termination

As a CXO, you may not be aware that cloud providers like AWS, GCP, and Azure reserve the right to terminate your account and destroy your infrastructure at any time, effectively shutting down your business operations. While this may seem like an extreme measure, it’s important to understand that cloud providers have strict terms of service that can lead to account termination for a variety of reasons, even if you’re not engaged in illegal or harmful activities.

A Chilling Example

I recently spoke with a friend who is the founder of a fintech platform. He shared a chilling incident that highlights the risks of relying on cloud providers. His team was using GCP’s Cloud Run, a container service, to host their API. They had a unique use case that required them to call back to their own API to trigger additional work and keep the service active. Unfortunately, GCP monitors this type of behaviour and flags it as potential crypto-mining activity.

On an ordinary Sunday, their infrastructure vanished, and their account was locked. It took them six days of nonstop effort to migrate to AWS.

Protect Your Business

This incident serves as a stark reminder that any business operating on cloud infrastructure is vulnerable to unexpected termination. While you may not be intentionally engaging in activities that violate cloud provider terms of service, it’s crucial to build your infrastructure with the possibility of termination in mind.

Here are some key steps you can take to protect your business from the cloud provider killswitch:

Read and understand the terms of service for each cloud provider you use.
Choose a cloud provider that aligns with your industry and business model.
Avoid relying on a single cloud provider.
Have a backup plan in place.
Regularly review your cloud usage and ensure compliance with cloud provider terms of service.

By taking these proactive measures, you can significantly reduce the risk of your business being disrupted by cloud provider termination and ensure the continuity of your operations.

Unleash the Power of Free Cloud Credits

For early-stage startups operating on a shoestring budget, free cloud credits can be a lifeline, shielding your runway from the scorching heat of cloud infrastructure costs. Acquiring these credits is a breeze, but the way most startups build their infrastructure – akin to an unbreakable blood oath with their cloud provider – restricts them to the credits granted by that single provider.

Why limit yourself to the generosity of one cloud provider when you could seamlessly switch between them to optimize your resource allocation? Imagine the possibilities:

AWS to GCP: Upon depleting your AWS credits, you could effortlessly migrate your infrastructure to GCP, taking advantage of their generous $200,000 credit offer.
Y Combinator: As a Y Combinator startup, you’re entitled to a staggering $150,000 in AWS credits and a mind-boggling $200,000 on GCP.
AI-Powered Startups: If you’re developing AI solutions, Azure welcomes you with open arms, offering $300,000 in free credits to fuel your AI models on their cloud.

By embracing cloud-agnostic architecture, you unlock the freedom to switch between cloud providers, potentially saving you a significant $200,000 upfront. Why constrain yourself to a single cloud provider when cloud-agnosticism empowers you to navigate the cloud landscape with flexibility and cost-efficiency?

Building Resilience: The Importance of Cloud Redundancy

In the ever-evolving world of technology, no system is immune to failure. Even industry giants like Silicon Valley Bank can outright disappear over a weekend or AWS’ main Datacenter can go offline due to a power fluctuation, highlighting the importance of proactively safeguarding your business operations.

Imagine the potential financial impact of a 12-hour outage on AWS for your company. The costs could be staggering, not only in lost revenue but also in reputational damage and customer dissatisfaction or even potential churn.

This is where cloud redundancy comes into play. By running parallel segments of your platform on multiple cloud providers, such as AWS and GCP, you’re essentially creating a fail-safe mechanism.

In the event of an outage on one cloud platform, the other can seamlessly pick up the slack, ensuring uninterrupted service for your customers and minimizing the impact on your business. Cloud redundancy is not just about disaster preparedness; it’s also about optimizing performance and scalability. By distributing your workload across multiple cloud providers, you can tap into the unique strengths and resources of each platform, maximizing efficiency and responsiveness.

In our case, we run the OCR packages, SAML, and Accounts service on Azure, our core “Recommendation engine” and “Booking Engine” on AWS. Yes, having a multi-cloud will involve initial costs that might be prohibitive, but in the long run, the benefits will far outweigh the costs.

Cloud Cost Negotiation: A Matter of Leverage

In the realm of business negotiations, the ultimate power lies in the ability to walk away. If the other party senses your lack of alternatives, they gain a significant advantage, effectively holding you hostage. Cloud cost negotiations are no exception.

Imagine you’ve built a substantial $10 million infrastructure on AWS, heavily reliant on their proprietary APIs like S3, Cognito, and SQS. In such a scenario, walking away from AWS becomes an unrealistic option. You’re essentially at their mercy, accepting whatever cloud costs they dictate.

While negotiating cloud costs may seem insignificant to a small company, for an organization with $10 million of AWS infrastructure, even a 3% discount translates into substantial savings.

To gain leverage in cloud cost negotiations, you need to establish a credible threat of walking away. This requires careful planning and strategic implementation of cloud-agnostic architecture, enabling you to seamlessly switch between cloud providers without disrupting your operations.

Cloud Agnosticism: Your Negotiating Edge

Cloud-agnostic architecture empowers you to:

Diversify your infrastructure: Run your applications on multiple cloud platforms, reducing reliance on a single provider.
Reduce switching costs: Design your infrastructure to minimize the effort and cost of migrating to a new cloud provider.
Strengthen your negotiating position: Demonstrate to cloud providers that you have alternative options, giving you more bargaining power.

By embracing cloud-agnosticism, you transform from a captive customer to a savvy negotiator, capable of securing favorable cloud cost terms.

Unforeseen Challenges: The Importance of Cloud Agnosticism

In the dynamic world of business, unforeseen challenges (and opportunities) can arise at any moment. We often operate with limited visibility, unable to predict every possible scenario that could impact our success. Here’s an actual scenario that highlights the importance of cloud-agnostic architecture:

Acquisition Deal Goes Through

This happened with One of my previous organisations, we tirelessly built this company from the ground up. Our hard work and dedication paid off when a large SaaS Unicorn approached us with an acquisition proposal.

However, during the due diligence, a critical issue emerged: Our company’s infrastructure was entirely reliant on AWS. The Acquiring company had a multi-year multi-million dollar deal with Azure and the M&A team made it clear that unless our platform can operate on Azure, the deal is off the table!

Our team faced the daunting task of migrating the entire infrastructure to Azure within a limited timeframe and budget. Unfortunately, the complexities of the migration proved time-consuming and the merger took 5 months to complete and the offer was reduced by $2 million!

The Power of Cloud Agnosticism

This story serves as a stark reminder of the risks associated with a single-cloud strategy. Had our company embraced cloud-agnostic architecture, we would have possessed the flexibility to seamlessly switch between cloud providers, potentially leading to a bigger exit for all of us!

Cloud-agnostic architecture offers several benefits:

Reduced Vendor Lock-in: Avoids dependence on a single cloud provider, empowering you to switch to more favourable options based on your needs.
Improved Negotiation Power: Gains leverage in cloud cost negotiations by demonstrating the ability to switch providers.
Increased Resilience: Protects your business from disruptions caused by cloud provider outages or policy changes.
Enhanced Scalability: Enables seamless expansion of your infrastructure across multiple cloud platforms as your business grows.

Embrace Cloud Agnosticism for Business Continuity

In today’s ever-changing technological landscape, cloud-agnostic architecture is not just a benefit; it’s a necessity for businesses seeking long-term success and resilience. By adopting a cloud-agnostic approach, you empower your company to navigate the complexities of the cloud landscape with agility, adaptability, and cost-efficiency, ensuring that unforeseen challenges don’t derail your journey.

My Solution

Here’s what I do about it, now after the lessons learnt. I use Multy. Multy is an open-source tool that simplifies cloud infrastructure management by providing a cloud-agnostic API. This means that developers can define their infrastructure configurations once and deploy them to any cloud provider without having to worry about the specific syntax or nuances of each cloud platform. While Multy provides an abstraction layer for deploying cross-cloud environments, you will also need to incorporate cloud-environment agnostic libraries to really make a difference.

References & Further Reading:

How to Manage Technical Debt in 2023: A Guide for Leadership

By Ramkumar Sundarakalatharan | October 8, 2023 | Comments 0 Comment

In this article, I will summarise effective strategies and best practices to tackle tech debt head-on.

Technical debt is an inevitable reality in software development. But, it can be leveraged just like a financial loan/debt can help you achieve your goals, if managed properly.
It can be used to drive competitive advantage by allowing companies to launch new products and features faster, experiment with new technologies, and improve the scalability and performance of their systems. However, like all loans, it need to be “Repaid” properly and at the right time, failing on it will create a downward spiral.

If you’re not careful, technical debt can quickly become a major burden that slows down development and makes it difficult to add new features or even fix bugs in a timely manner.

We will discuss how to identify technical debt and the signs of poorly managed debt, and then provide a strategy for reducing it. We will also discuss what a healthy level of technical debt looks like and how leaders can use it to their advantage.

Good Tech Debt Vs Bad Tech Debt

Robert Kiyosaki, the author of Rich Dad Poor Dad, famously said:

Bad debt takes money out of your pocket, while good debt puts money in your pocket.
– Robert Kiyosaki

The same is true of tech debt.

Technical debt is the cost of not doing things the right way the first time. Good technical debt is accrued when you make trade-offs to meet deadlines or deliver new features quickly. Bad technical debt is accrued when you make poor decisions or cut corners.

Bad tech debt will probably make your PMs, Sales and CEO happy for a quarter or two. But after that, they will be asking why everything is behind schedule and dealing with customer complaints because things aren’t working properly.

Now that I have presented the obvious in a familiar “Quadrant”, you can actually skip the terminologies and definitions part of this article! 😀

For my verbal brethren, Which is the Tech Debt you’d need to ruthlessly hunt down to extinction? Obviously, it is the untracked, undocumented ones. And the ones which are dragging your team on a downward spiral (immaterial of whether it is tracked or not)

Why does your Tech Debt keep accumulating?

Before we can think about building a strategy to solve tech debt, we need to understand how it gets out of control in the first place.

It’s called “impact visibility”.

Fixing code debt issues is impossible if:

1, You’ve no record of what technical debt issues you have

2, You’ve got a backlog, but you can’t see which issues are related to what code

In both cases, you can’t prioritise tech debt over shipping new features.

We need to get more granular about what impacts these two tech debt cases above.

Issue invisibility — There’s no source of shared knowledge. Codebase health info is locked in (few) engineers’ heads.
No code quality culture — Shipping fast, whatever the cost, like it’s going out of fashion.
Poor process — Tech debt work sucks. Nobody likes creating Jira tickets. “Jira” has become a dirty word.
Low-time investment — Justifying the time to fix tech debt or to refactor is a constant uphill battle. After a point, engineers become silent!

Lack of context — Issues in Jira are a world away from the hard reality of the codebase. They’re not related in any way.

So what’s the source of this? Let’s talk strategy.

Spoiler… It’s about changing organisational culture and developer behaviour to track issues properly.

Creating a strategy to reduce technical debt

Track. Issues. Properly.

Good tech debt management starts with team-wide excellence at tracking issues.

You can’t have a tech debt strategy without tracking.

The engineering leader’s job is to make that “issue tracking” easy for your team. There is supposed to be a software for that – Jira, Asana, Rally or something of that sort.

The problem is, I’ve never believed they really get to the bottom of the problem, and after speaking with scores of engineers and leaders about it, they usually don’t either. My personal belief is most companies suffer on the velocity after their Jira rollout! It is a bit like,

No two countries that both have a McDonald’s have ever fought a war against each other.
Thomas L. Friedman – in The Lexus and the Olive Tree!

As a leader, You need to find a way to…

Show engineers when they’re working on code with tech debt, without them having to jump thru 3 hoops.
Make it really easy for team members to report tech debt.
Create a natural way to discuss codebase issues.
Integrate tech debt work into your workflows and involve PMs if required.

There are multiple ways to achieve this, the easiest is to not address it. Ie: not address it intentionally, just tweak your existing pipeline. This can be done by,

A very robust linting & integration to the IDE
Tighter Git rules for commits
SAST which runs on the pipeline
and can feed into the IDE

Prioritising impactful tech debt

At this point, it should be obvious, but prioritising the right issues is only possible if you’re tracking the impact of these “issues” and it could be direct or indirect (Dependency, Sequencing, Rework avoidance etc) .

Once you’ve got them, you should regularly and consistently use them to decide what to address. This usually happens during the backlog grooming or sprint planning sessions. But, this decision-making process needs to be strategic. Not at all tactical, ie: DO NOT delegate it to the whims and whimsicals of your TL/PM or even EM.

You or someone with a context of the organisation and position on sales, clients, revenue etc., should be doing this.

A good way to start is by choosing a theme each time you prioritise issues. For example, you could prioritise issues that…

Are impacting a specific feature you need to work on in the next quarter
Are impacting the customer’s UX
Are affecting efficiency/morale on the team
Are impacting the security posture

This is often straightforward if you’ve got high-quality issues that traceable to code and tagged as such.

Most people wonder how to get the time for these “Tasks”. I have two recommendations.

Take an entire sprint every quarter to repay the tech debt (Will need high-level buy-in, It is slightly harder to align your CXOs)
Allocate 15-20% of bandwidth in every sprint. (Easier to achieve buy-in from CXOs, harder to drive with engineers)

Engineers generally won’t prioritise tech debt work by themselves because of the conflict of interest/pressure of shipping fast. This was evident from multiple high velocity/impact software engineering teams including ones at AirBnb, Netflix and Spotify. A commitment to code refactoring and maintenance work should be endorsed and supported from the top and reinforced regularly.

How much Tech Debt can you take on?

Managing technical debt is like managing financial debt. You can use it to your advantage, but you need to be careful not to let it get out of control.

Your technical debt budget is the amount of technical debt that you are willing to take on in order to achieve your business goals. You should not try to solve all of your technical debt at once, but instead focus on the most important items.

Prudent technical debt is debt that you take on deliberately and knowingly, in order to achieve a specific goal. For example, you might take on technical debt to launch a new product quickly, or to add a new feature that is in high demand by your customers.

If you manage your technical debt properly, it can be a powerful tool for gaining a competitive advantage. However, if you let your technical debt get out of control, it can lead to serious problems, such as increased costs, delays, and security vulnerabilities.

Concluding remarks:

Technical debt is one of the most neglected areas of software development. It is often only given priority when it is too late and has already caused serious problems.

However, when leaders work together and develop a consistent and process-driven strategy, technical debt can be effectively managed.

The best engineering teams are constantly thinking about how to use their technical debt budget to their advantage.

References and Further Reading:

TinyMCE blog on TechDebt – https://www.tiny.cloud/blog/technical-debt-tracking/
6 ways for PMs to help in managing Tech Debt – https://www.productplan.com/learn/manage-technical-debt/
Key Takeaways from AirBnB’s architecture – https://thenewstack.io/airbnbs-10-takeaways-moving-microservices/
How Google Measures Technical Debt – https://www.linkedin.com/pulse/how-google-measures-manages-tech-debt-abi-noda/

No McKinsey, You got it all wrong about developer productivity!

By Ramkumar Sundarakalatharan | September 23, 2023 | Comments 0 Comment

Disclaimer: I have been an enormous proponent of Developer Productivity and have tried to implement automated metrics collection in 3 orgs with varied success. In my Mentoring sessions with early-stage startup leaders as well, I (re)enforce the importance of being aware of Dev Productivity. So much so, that I have written a 2-part article on the same here, here and here. I have also been a huge fan of McKinsey and how they seem to get answers which eluded the attention and resources of mega-corporations or governments alike. However, this article is written to communicate an entirely different perspective. In my opinion, McKinsey has got this entire “framework” thing about “dev productivity” wrong.

Introduction:

About a month back, McKinsey published an article claiming that they have developed a framework to measure productivity. They also acknowledged the fact that they were simply rehashing some of the existing metrics (like DORA and SPACE), which were used by Engineering Leaders and have simplified it (without the context) and are pitching it to their traditional buyers, the C-Suite executives in Mega corporations. Actually, some of these metrics can be useful tools if used correctly -One example is Hand-offs. But, the main reason I have chosen to write this article is their central focus seems to be “Coders should code”. It also appears to have A) missed the context of every metric, OR B) Omitted the context so as not to burden their target audience.

Finally, there is a mix-max of things to track, metrics to monitor and Opportunities to Focus, which looks like

Captain Ramius Pointing to a young Jack Ryan that Admiral Halsey was reckless!

Captain Ramius Pointing to a young Jack Ryan that Admiral Halsey was Stupid!

The Legendary Kent Beck has written a deep 2-part piece on countering the conjectures presented by McKinsey and elaborating on the gaps that engineering orgs are traditionally bound to manifest. It is very well written and covers almost everything. There are also a bunch of other eminent Software Engineers who have written on this and I have tried to give a quick lot at the bottom of this article.

What Was I concerned about?

Focus On Activities

I was primarily concerned about the lack of focus on Outcomes and Impact and a focus on the “Activities” in the proposed framework!

Any engineering leader or manager will tell you that Code Review Velocity and Deployment Frequency have nothing to do with measuring outcomes. While I will not discount Cycle Time or MTTR (I take pride in building multiple teams with one of the lowest MTTR and Cycle times in the ecosystem). They are indicators of some process elements/activities that could lead to outcomes. If we want to measure something, it should be Outcomes, not activities!

Focus on Optimisation of Irrelevant Metrics

Code Review Velocity:

If you want to time-motion the code review process in the entire stream map, you’ll find that async code review is killing your productivity. Pairing improves that dramatically. Instead of trying to sub-optimize for code review, measure the thing we actually want to improve. Which will be “Cycel Time”.

Story Points Completed:

Let’s agree on a basic fact. A “story point” is a made-up number. It was conceived as yet another way to obfuscate estimates for thought work that is difficult to estimate. As originally conceived, it represented the number of mythical “ideal days” of effort. There’s so much time wasted on getting better at “story pointing,” arguing about the Fibonacci sequence, “planning poker,” and other story point nonsense. Frankly, it is one of the “Bad” elements of Scrum! As a leader, you should find and remove handoffs and wait times. Story points are useless for anything and even more useless for this goal. Track throughput instead.

Handoffs:

This is a good one. Good job, McKinsey. You got something right. Stop using testing teams, use pairing instead of code review, operate what you build, and don’t have any people doing anything manual to the right of development.

Contribution Analysis and Opportunities focus

In the other focus areas, they have listed metrics at the individual level that can be useful unless you measure “developer satisfaction,” “retention,” and “interruptions” at the individual level. These should only be measured in aggregates to prevent any cognitive bias. IMO, Things start getting really toxic in the “Opportunities focus” section, though.

I have been part of organisations and processes where there was a focus on tracking and measuring the outcomes of individuals. It did not play out well, ever. My Conclusion after reading the article for the second time is that McKinsey thinks their intended audience (CEOs and CFOs) cannot understand “systems thinking.” Now, If you roll out this or a similar framework and announce this and what do you think will happen?

You have a group of people all working on the same backlog but not acting as a team. Code review suffers, mentoring sufferers, pairing is hard, work breakdown suffers, etc. Anything that requires more than one person to conduct/conclude, including helping someone get unstuck, will get deprioritised!

Overall, The inferences seem to be based on hard facts, but the conjectures are all flawed.

Why This Now?

At this point, I want to highlight what “Triggered” me to write this, read the following.,

For example, one company found that its most talented developers were spending excessive time on noncoding activities such as design sessions or managing interdependencies across teams. In response, the company changed its operating model and clarified roles and responsibilities to enable those highest-value developers to do what they do best: code.
McKinsey’s Article on the purported Framework

Wow. I pray for that company.

So, I believe after McKinsey pointed to the fact, that developers are involved in irrelevant things like design, architecture etc. They created separate towers of responsibility for design. In that case, I am puzzled about who will be responsible for the minor things like dependency management, prerequisites, versioning, capacity planning, concurrency, scalability etc.

Did they get anything Right?

Yes. There are tonnes, but they are buried at the bottom. Their focus on Hand-offs and cycle times are really worth tracking in any engineering org. To the authors’ credit, they have also identified some of the core issues with measuring Developer Productivity. But, someone higher in the firm seem to have suggested to soften the blow. So, they have diluted and buried those sections. I will share 2 gems here.

To truly benefit from measuring productivity, leaders and developers alike need to move past the outdated notion that leaders “cannot” understand the intricacies of software engineering, or that engineering is too complex to measure.

The real problem is that in many large organisations, “The Management” doesn’t understand the work they manage. Management can understand the intricacies of software engineering if they become leaders and study the work they manage. In a large behemoth, not all managers are leaders. They want a framework and will enforce it with an iron fist. Now, McKinsey has delivered them a framework!

Learn the basics. All C-suite leaders who are not engineers or who have been in management for a long time will need a primer on the software development process and how it is evolving.

This one Nailed it! The primary reason “Management” finds it difficult to measure the right thing is because they sometimes do not understand the work they want to measure. Leaders who understand do measure the right things. My primary concern with this framework is, in trying to solve this, McKinsey has made the problem worse!

Just google “McKinsey developer productivity” and you’ll find more articles on how this framework is flawed than the original article link!

Anto’s Response to the Article and the purported Framework.

References & Further Reading/Watching:

1, Mc.Kinsey Article – https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity
2, Kent Beck’s rebuttal – https://newsletter.pragmaticengineer.com/p/measuring-developer-productivity
3, Redidit – https://www.reddit.com/r/programming/comments/1650595/measuring_developer_productivity_a_response_to/
4, Level Up Coding – https://levelup.gitconnected.com/the-developers-productivity-can-t-be-measured-in-mckinsey-s-way-an-analysis-4d81924279ae
5, Measuring Developers Productivity… McKinsey what’s the point? – https://www.youtube.com/watch?v=wjQn8nnkXTs
6, Can We Measure Developer Productivity? A Reaction to McKinsey’s Article – https://www.youtube.com/watch?v=ETa24ErdcwQ
7, HOW TO MEASURE ENGINEERING PRODUCTIVITY? – https://nocturnalknight.co/2022/11/how-to-measure-engineering-productivity/
8, Business Value delivery by Engineering Teams in StartUps – Part 1 – https://nocturnalknight.co/2021/10/business-value-delivery-by-engineering-teams-in-startups-part-1/#comment-773
9, Business Value Delivery by Engineering Teams in StartUps – Part 2 – https://nocturnalknight.co/2021/10/business-value-delivery-by-engineering-teams-in-startups-part-2/
10, Space Metrics – https://www.harness.io/blog/space-metrics-get-started
11, DORA Metrics – https://www.leanix.net/en/wiki/vsm/dora-metrics
12, Dave Farley’s Response To The NONSENSE McKinsey Article On Developer Productivity – https://www.youtube.com/watch?v=yuUBZ1pByzM

Why Engineers Hate your “Boiler Plate” Job Descriptions?

By Ramkumar Sundarakalatharan | August 21, 2022 | Comments 0 Comment

How to Attract the most relevant applicants with great job postings

One of the main problems in SaaS/Software engineering hiring is the way job descriptions are written. While I knew this for some time (read years) The problem is, I was too lazy to change anything about it! That is until recently, one of the candidates I was interviewing for an Engineering Manager Role said this in our introductory call,

Me: Hope you’ve had a discussion with Ms.ABC (our HR) regarding the Roles and Responsibilities. If there are questions on it I can answer them, or we can get into the agenda.

Candidate: Yes I had a discussion with Ms. ABC. But, quite frankly it was your boilerplate JD. I’d actually want to understand what exactly I’d be doing. What will I be in charge of? What will I move?

Needless to say, I spent the next ~30 minutes walking through the current team structure, where he’d come in, what will he own, what the growth trajectory looks like etc. Ultimately, we did 2 more calls before both of us were satisfied that there are mutual synergies and went ahead. It made me reevaluate all of our Job Descriptions over the weekend and rewrote almost half of them to include factual details on projects, outcomes expected, tools available, glimpse of growth among other things.

After this, I asked the HR to send this “revised” JD to the candidates once again.

And the result was visible from Monday!

Either candidates that the HR thought super suited started dropping voluntarily from the process or candidates started expressing interest, doing more research on our stack, infra, product proposition and competitor benchmarking, before the call. Some even did a cold reach-out on Linkedin.

So, I wanted to share the small titbit here.

Why General Descriptions Don’t work?

Most Job descriptions barely resemble “specifications” at all, but feel more like generic stubs. Sort of like the equivalent of shopping for a car with as much details as “red and goes fast” or “black and built to last.”

It leaves too much open to the imagination for it to be a successful criterion to enable fitment. With criteria as broad as this you’ll end up spending an inordinate amount of time executing the search, since so many things appear to be a match. For me, Red and goes fast is always a 1971 Ford Mustang, for you it could be a 1998 Ferrari 365.

A Red Ford Mustang & a Ferrari 365, What do you Fancy?

A Black Dodge Ram and Jeep Cherokee, What do you Prefer?

The truth is that statements like the above — or its equivalent in engineering hiring — “Get me a backed dev with OOPS in Python/Java/Go with 5 years experience” — guarantee a similarly frustrating shopping experience. You’ve made it needlessly difficult for yourself and your HR/TA team to identify the specific talent you want. In this trite example you’ve indicated that you’re looking for a mid-level engineer that knows OOPS, but that basically includes everyone that ever graduated with a CS degree in the past 5 (to 10?) years. Surprisingly, many job “specifications” we see contain rarely any more info. These are the “Boiler Plate” Job Descriptions.

How did we find ourselves in this mess?

I understand why hiring managers do this. Sometimes they’re not exactly sure what they want — after all, it takes real time and effort to work out the specific vision for the role. But instead of acknowledging this and then solving the real problem (their own laziness), they delegate the JD writing to their Team/HR and it turns into generic tech JD. But the hiring manager is unfazed — “I’ll know the right candidate when I see them,” they say. Really? It could be true in some instances. Sometimes, we start with a Backend developer, then we come across a candidate with experience in building a full pipeline or a payment system. Then we expand the role to cover wider scope and evaluate against it. But generally, How will they know the right candidate if they can’t write down specifically what the candidate looks like?

Another reason for generic job specs is because a hiring manager is recruiting in a talent-constrained market (sound familiar?), and it feels like a smart move to cast as wide of a net as possible. Theoretically, one should be able to get more candidates into the top of the funnel this way, right?

Perhaps. Theoretically. But in my experience, this approach usually, and utterly, backfires. Here’s why:

Boiler Plate Job Descriptionsaren’t designed to appeal to any engineer in particular

In the current job market, engineers are faced with a wide variety of options from some amazing established companies and lots of seemingly “sexy” startups. (after “the great resignation”)

You need to take your opportunityto stand out! The more specific you are about the challenges a specific engineer will get to work in a specific role, the more traction you’ll get with (the right) candidates.

The idea is to make an engineer excited when you describe what they will “Get to do” in the first 12-15 months in the role.

Be very specific,

Talk about the product/modules they will own/drive/be part of,
Talk about the outcomes and metrics they will own and drive,
Talk about the toolchains & frameworks they’ll use (or get to choose),
Talk about What’s hard/challenging about the role, How are they a great fit.
The more details you can squeeze into the spec to help them visualize their role and the projects they’ll be working on the better.

Boiler Plate Job Descriptions don’t arm others to help you

Another bad thing about generic JD is that they don’t help others help you. Think of how much reach could you get by using everyone in your network as a recruiter. But in today’s scenario, “everyone else” is already asking them if they know any Python Engineers or React Developers or Go Engineers. Why would they help you? Because you’ve taken the time to get specific about what you want? Maybe.

Take a look at the following Job Description. Anyone in software engineering who had some deployment, infra-planning & communication seems to be a candidate for the role.

I recently got a request from a founder friend of mine to refer a Sr.Tech Lead/Engineering Manager for an early-stage startup. When I saw the JD, it was so generic it did not even have the primary stack on it. Assume sharing it from your handle. I politely declined to share it and asked for some more information and said will come back once he shares. (I believe he is very busy and hence hasn’t come back)

The bottomline is, Make them want to help you — give them a JD that’s so amazing, well-written, specific, (even entertaining) — that they can’t help but pass it on, post it, tell their friends about it, etc. If you make it stand out — you’ll get more attention from the folks that can help because you’ll arm them with something interesting & effective that they can use to reach out to their network.

Boiler Plate Job Descriptions don’t enable you to know what success looks like

This is a very simple point — see above — if you can’t explain what the ideal candidate looks like, how will you know when you’ve found them? The JD shared by my friend looks as vague as this.

Actually, his company was looking for a guy who could not just do Infrastructure Architecture. They wanted someone who has architected/built a cloud native SaaS application. His team has built the application and has no idea how to convert it to truly cloud native format to scale without breaking the bank!

The main idea here is about not being willing to settle for less. I realise the market is tough right now, and maybe you’ll need to make compromises. But do you want to start at the wrong end of the pool? When you go out the door with a generic description, you preemptively give up the battle. If you need to settle — fine! — but know exactly what points you’re compromising on.

The larger problem is that if you don’t know what the best candidate really looks like then the other people involved in making a decision likely don’t know what s/he looks like either. A well-defined, specific description of the role enables everyone involved in the interviewing and hiring process to be on the same proverbial page.

Now go get started!

Fixing your job descriptions will take some work(as I found out). To get to specifics, you’ll need to dig in and make additional efforts. You might also need to do some retraining in your organization and teach others these principles, too.

But when you get through the hard work, your postings will turn into valuable weapons that will,

a.) appeal to the engineers you want to reach,

b.) enable others to help you expand your outreach, and

c.) get your hiring team on the same page to quickly come to the right decision.

If you’re curious to see what roles we currently hire for, we have a lot of openings in Product, Design and Engineering:

It ranges from Kubernetes Architect, React Native Mobile Dev, Sr.Backend Dev, Tech Lead-Mobile Technologies, Engineering Managers, Associate Product managers, Product Managers etc.

More details can be had at https://angel.co/company/itilite-1/jobs or you can ping me 🙂