Author: Ramkumar Sundarakalatharan

From Soaring High to Stalling Out: How Boeing Lost Its Engineering Edge

From Soaring High to Stalling Out: How Boeing Lost Its Engineering Edge

The world’s largest aerospace conglomerate turns 108 this year. Boeing’s 1st plane, a Boeing Model 1 officially took off on 15 July 1916 when Wong Tsu (A Chinese graduate from MIT) completed the construction at the Heath Shipyard. As of 2023 September, a total of 78000 aircraft have rolled out of Boeing factories (excluding license-produced models elsewhere) with a total of 500+ unique aircraft designed across civilian, military, concept, prototypes and experiential designs. Boeing, really used to be a powerhouse of aviation technologies.

Boeing, once synonymous with aviation innovation, has hit turbulence in recent years. The company’s gradual decline can be traced to a shift in focus, prioritising short-term profits over the long-term commitment to hardcore engineering excellence that built its reputation.

P&L of Boeing across last 10 years.

P&L trend of Boeing, Infographics source Statista

A Legacy of Innovation Tarnished

Boeing’s history is a testament to American ingenuity. From the iconic 747 “Jumbo Jet” revolutionising passenger travel (over 1,500 delivered) to the technologically advanced 787 Dreamliner boasting superior fuel efficiency (over 1,700 delivered) [1], the company consistently pushed the boundaries of aerospace engineering. However, a gradual cultural shift began prioritising financial goals over engineering rigour. A Harvard Business Review article [2] highlights the pressure placed on engineers to meet aggressive deadlines and cost-cutting measures, potentially contributing to the tragic crashes of the 737 MAX aircraft. IMHO, The Boeing engineering disaster had roots in Welch’s deeply flawed management doctrines which were spread across American industry by his acolytes.

Lost Market Share and a Bleak Future

This shift in priorities has had significant financial consequences. The 737 MAX grounding, coupled with production delays of the 787 Dreamliner, significantly eroded Boeing’s market share. In the single-aisle passenger jet market, the crown jewel of commercial aviation, Airbus, Boeing’s main competitor, now holds a commanding lead of over 60% [3]. While Boeing struggles with a backlog of unfulfilled orders (around 4,000), Airbus boasts a healthier backlog exceeding 7,000 aircraft [4]. This translates to a stark difference in profitability. In 2023, Airbus reported a net profit of €4.2 billion ($4.5 billion) compared to Boeing’s net loss of $3.7 billion.

Examples of Lost Focus:

  • 737 MAX: The faulty design and subsequent crashes of the 737 MAX (over 100 undelivered orders due to grounding) exposed a culture that prioritised speed to market over thorough engineering review.
  • 787 Dreamliner: Production problems with the Dreamliner, including issues with electrical wiring and fuselage construction (hundreds of delayed deliveries), further eroded trust in Boeing’s manufacturing capabilities.
  • X-32 JSF: The loss of the JSF contract to Lockheed Martin in 2001 was a major blow to Boeing, as it represented the most important international fighter aircraft project since the Lightweight Fighter program competition of the 1960s 

Can Boeing Recover?

The road to recovery for Boeing will be long and arduous. Rebuilding trust with airlines and passengers will require a renewed commitment to safety and engineering excellence. This may involve significant changes in leadership and corporate culture, prioritizing long-term sustainability over short-term gains.

Boeing’s story serves as a cautionary tale for any company. While financial goals are important, sacrificing core values and engineering expertise can lead to devastating consequences. The future of this aviation giant remains uncertain, but one thing is clear: regaining its former glory will require a return to the principles that made it great in the first place.

References and Further Reading:

Understanding The Implications Of The Data Breaches At Microsoft.

Understanding The Implications Of The Data Breaches At Microsoft.

Note: I started this article last weekend to try and explain the attack path  “Midnight Blizzard” used and what Azure admins should do to protect themselves from a similar attack. Unfortunately, I couldn't complete/publish it in time and now there is another breach at Microsoft. (🤦🏿) Now, I had to completely redraft it and change the focus to a summary of data breaches at Microsoft and a walkthrough on the current breach. I will publish the Midnight Blizzard defence later this week.
Microsoft Data Breach

The Timeline of the Breaches

  • 20th-25th September 2023: 60k State Department Emails Stolen in Microsoft Breach
  • 12th-25th January 2024: Microsoft breached by “Nation-State Actors”
  • 11th-14th February 2024: State-backed APTs are weaponising OpenAI models 
  • 16th-19th February 2024: Microsoft admits to security issues with Azure and Exchange servers.
Date/MonthBreach TypeAffected Service/AreaSource
February 2024Zero-day vulnerabilities in Exchange serversExchange serversMicrosoft Security Response Center blog
January 2024Nation State-sponsored attack (Russia)Email accountsMicrosoft Security Response Center blog
February 2024State-backed APTs are weaponising OpenAI modelsNot directly impacting MS services
July 2023Chinese Hackers Breach U.S. Agencies Via Microsoft CloudAzureThe New York Times, Microsoft Security Response Center blog
October 2022BlueBleed Data Leak, 0.5 Million user data leakedUser Data
December 2021Lapsus$ intrusionSource code (Bing, Cortana)The Guardian, Reuters
August 2021Hafnium attacks Exchange serversExchange serversMicrosoft Security Response Center blog
March 2021SolarWinds supply chain attackVarious Microsoft products (indirectly affected)The New York Times, Reuters
January 2020Misconfigured customer support databaseCustomer data (names, email addresses)ZDNet
This is a high-level summary of breaches and successful hacks that got reported in the public domain and picked up by tier 1 publications. There are at least a dozen more in the period, some are of negligible impact, and others are less probable

Introduction:

Today, The digital landscape is a battlefield, and even tech giants like Microsoft aren’t immune to cyberattacks. Understanding recent breaches/incidents and their root causes, and effective defence strategies is crucial for Infosec/IT and DevSecOp teams navigating this ever-evolving threat landscape. This blog post dives into the security incidents affecting Microsoft, analyzes potential attack paths, and equips you with actionable defence plans to fortify your infrastructure/network.

Selected Breaches:

  • January 2024: State actors, purported to be affiliated with Russia leveraged password spraying and compromised email accounts, including those of senior leadership. This highlights the vulnerability of weak passwords and the critical need for multi-factor authentication (MFA).
  • January 2024: Zero-day vulnerabilities in Exchange servers allowed attackers to escalate privileges. This emphasizes the importance of regular patching and prompt updates to address vulnerabilities before they’re exploited.
  • December 2021: Lapsus$ group gained access to source code due to misconfigured access controls. This underscores the importance of least-privilege access and regularly reviewed security configurations.
  • Other incidents: Supply chain attacks (SolarWinds, March 2021) and data leaks (customer database, January 2020) demonstrate the diverse threats organizations face.

Attack Paths:

Understanding attacker motivations and methods is key to building effective defences. Here are common attack paths:

  • Social Engineering: Phishing emails and deceptive tactics trick users into revealing sensitive information or clicking malicious links.
  • Software Vulnerabilities: Unpatched software with known vulnerabilities offers attackers an easy entry point.
  • Weak Passwords: Simple passwords are easily cracked, granting access to accounts and systems.
  • Misconfigured Access Controls: Overly permissive access rules give attackers more power than necessary to escalate privileges and cause damage.
  • Supply Chain Attacks: Compromising a vendor or partner can grant attackers access to multiple organizations within the supply chain.

Defence Plans:

Building a robust defense requires a multi-layered approach:

  • Patch Management: Prioritize timely patching of vulnerabilities across all systems and software.
  • Strong Passwords & MFA: Implement strong password policies and enforce MFA for all accounts.
  • Access Control Management: Implement least privilege access and regularly review configurations.
  • Security Awareness Training: Educate employees on phishing, social engineering, and secure password practices.
  • Threat Detection & Response: Deploy security tools to monitor systems for suspicious activity and respond promptly to incidents.
  • Incident Response Planning: Develop and test a plan to mitigate damage, contain breaches, and recover quickly.
  • Penetration Testing: Regularly test your defenses by simulating real-world attacks to identify and fix vulnerabilities before attackers do.
  • Network Segmentation: Segment your network to limit the potential impact of a breach by restricting access to critical systems.
  • Data Backups & Disaster Recovery: Regularly back up data and have a plan to restore it in case of an attack or outage.
  • Stay Informed: Keep up-to-date on the latest security threats and vulnerabilities by subscribing to security advisories and attending industry conferences.

Conclusion:

Cybersecurity is an ongoing battle, but by understanding the tactics employed by attackers and implementing these defence strategies, IT/DevOps admins can significantly reduce the risk of breaches and protect their networks and data. Remember, vigilance and continuous improvement are key to staying ahead of the curve in the ever-evolving cybersecurity landscape.

Disclaimer: This blog post is for informational purposes only and should not be considered professional security advice. Please consult with a qualified security professional for guidance specific to your organization or mail me for an obligation free consultation call.

References and Further Reading:

What Makes SM2 Encryption Special? China’s Recommended Algorithm

What Makes SM2 Encryption Special? China’s Recommended Algorithm

This article is intended for security enthusiasts or otherwise for people with an advanced understanding of Cryptography and some Programming. I have tried to give in some background theory a very basic implementation.

Are there backdoors in AES and what is China’s response to it?

The US NIST has been pushing AES as the standard for symmetric key encryption. However, many luminaries in cryptographic research and industry observers suspect that as possibly pushing a cipher with an NSA/ GCHQ backdoor. For Chinese entities (Government or commercial), the ShāngMì (SM) series of ciphers provide alternatives. The SM9 standards provide a family of algorithms which will perform the entire gamut of things that RSA or AES is expected to do. They include the following.

SM4 was developed by Lü Shuwang in 2007 and became a national standard (GB/T 32907–2016) in 2016 [RFC 8998].

Elliptic Curve Cryptography (ECC)

ECC is one of the most prevalent approaches to public-key cryptography, along with Diffie–Hellman, RSA & YAK

Public-key Cryptography

Public-key cryptography relies on the generation of two keys:

  • one private key which must remain private
  • one public key which can be shared with the world

It is impossible to know a private key from a public key (it takes more than centuries to compute – assuming a workable quantum computer is infeasible using existing material science). It is possible to prove the possession of a private key without disclosing it. This proof can be verified by using its corresponding public key. This proof is called a digital signature.

High-level Functions

ECC can perform signature and verification of messages (authenticity). ECC can also perform encryption and decryption (confidentiality), however, not directly. For encryption/decryption, it needs the help of a shared secret aka Key.

It achieves the same level of security as RSA (Rivest-Shamir-Adleman), the traditional public-key algorithm, using substantially shorter key sizes. This reduction translates into lower processing requirements and reduced storage demands. For instance, an ECC 256-bit key provides comparable security to an RSA 3072-bit key.

For brevity’s sake, I’d refer you to Hans Knutson’s very well-explained article on Hacker Noon

Theory Summary: A Look Inside SM2 Key Generation

This section aims to offer a simplified understanding of different parameters found in SM2 libraries and their corresponding meanings, drawing inspiration from the insightful guides by Hans Knutson on Hacker Noon and Svetlin Nakov’s CryptoBook. (links in the reference section)

Comparing RSA and ECC Key Generation:

  • RSA: Based on prime number factorization.
    • Private key: Composed of two large prime numbers (p and q).
    • Public key: Modulus (m) obtained by multiplying p and q (m = p * q).
    • Key size: Determined by the number of bits in modulus (m).
    • Difficulty: Decomposing m back into p and q is computationally intensive.
  • ECC: Leverages the discrete logarithm of elliptic curve elements.
    • Elliptic curve: Defined as the set of points (x, y) satisfying the equation y^2 = x^3 + ax + b.
    • Example: Bitcoin uses the curve secp256k1 with the equation y^2 = x^3 + 7.
    • Point addition: Defined operation on points of the curve.

Key Generation in SM2:

  1. Domain parameters:
    • A prime field p of 256 bits.
    • An elliptic curve E defined within the field p.
    • A base point G on the curve E.
    • Order n of G, representing the number of points in the subgroup generated by G.
  2. Private key:
    • Randomly chosen integer d (1 < d < n).
  3. Public key:
    • Point Q = d * G.

Understanding Parameters:

  • Prime field p: Defines the mathematical space where the curve operates.
  • Elliptic curve E: Provides a structure for performing cryptographic operations.
  • Base point G: Serves as a starting point for generating other points on the curve.
  • Order n: Represents the number of points in the subgroup generated by G, which dictates the security level of the scheme.
  • Private key d: Secret integer randomly chosen within a specific range.
  • Public key Q: Point obtained by multiplying the private key d with the base point G.

Visualization:

Imagine a garden with flowers planted on specific points (x, y) satisfying a unique equation. This garden represents the elliptic curve E. You have a special key (d) that allows you to move around the garden and reach a specific flower (Q) using a defined path. Each step on this path is determined by the base point G. While anyone can see the flower (Q), only you have the knowledge of the path (d) leading to it, thus maintaining confidentiality.

This analogy provides a simplified picture of key generation in SM2, illustrating the interplay between different parameters and their cryptographic significance.

Diving Deeper into SM2/SM3/SM4 Integration with Golang

This section focuses on the integration of the Chinese cryptographic standards SM2, SM3, and SM4 into Golang applications. It details the process of porting Java code to Golang and the specific challenges encountered.

Open-Source Implementations:

  • GmSSL: Main open-source implementation of SM2/SM3/SM4, stands for “Guomi.”
  • Other implementations: gmsm (Golang), gmssl (Python), CFCA SADK (Java).

Porting Java Code to Golang:

  • Goal: Reverse-engineer the usage of CFCA SADK in Java code and adapt the corresponding functionality in Golang using gmsm.
  • Approach:
    • Hashing (SM3) and encryption (SM4) algorithms were directly ported using equivalent functions across languages.
    • Security operations added to a classic REST API POST required specific attention.
    • Step 1:
      • Original parameters are concatenated in alphabetical order.
      • API key is appended.
      • The combined string is hashed using SM3.
      • The resulting hash is added as an additional POST parameter.
    • Step 2:
      • Original parameters are concatenated in alphabetical order.
      • The signature is generated using SM2.
      • Challenge: Golang library lacked PKCS7 formatting support for signatures, only supporting American standards.
      • Solution: Modification of the Golang library to support PKCS7 formatting for SM2 signatures.

Response Processing:

  • Response body is encrypted using SM4 with a key derived from the API key.
  • Response body includes both an SM3 hash and SM2 signature for verification.

Key Takeaways:

  • Porting cryptographic algorithms across languages requires careful consideration of specific functionalities.
  • Lack of standard support for specific formats (PKCS7 in this case) might necessitate library modification.
  • Integrating SM2/SM3/SM4 in Golang requires utilizing libraries like gmsm and potentially adapting them for specific needs.

Getting your Hands Dirty

Go to https://github.com/guanzhi/GmSSL/releases download the version for your OS and move to your working directory.

1 - $ unzip or tar -xvf GmSSL-master.zip/tar
2 - $ mkdir build
    $ cd build
    $ cmake ..
    $ make
    $ make test
    $ sudo make install
3 - $ gmssl version
    $ GmSSL 3.1.0 Dev
4 -
$ KEY=11223344556677881122334455667788
$ IV=11223344556677881122334455667788

$ echo hello | gmssl sm4 -cbc -encrypt -key $KEY -iv $IV -out sm4.cbc
$ gmssl sm4 -cbc -decrypt -key $KEY -iv $IV -in sm4.cbc

$ echo hello | gmssl sm4 -ctr -encrypt -key $KEY -iv $IV -out sm4.ctr
$ gmssl sm4 -ctr -decrypt -key $KEY -iv $IV -in sm4.ctr

$ echo -n abc | gmssl sm3
$ gmssl sm2keygen -pass 1234 -out sm2.pem -pubout sm2pub.pem
$ echo -n abc | gmssl sm3 -pubkey sm2pub.pem -id 1234567812345678
$ echo -n abc | gmssl sm3hmac -key 11223344556677881122334455667788

$ gmssl sm2keygen -pass 1234 -out sm2.pem -pubout sm2pub.pem

$ echo hello | gmssl sm2sign -key sm2.pem -pass 1234 -out sm2.sig #-id 1234567812345678
$ echo hello | gmssl sm2verify -pubkey sm2pub.pem -sig sm2.sig -id 1234567812345678

$ echo hello | gmssl sm2encrypt -pubkey sm2pub.pem -out sm2.der
$ gmssl sm2decrypt -key sm2.pem -pass 1234 -in sm2.der

$ gmssl sm2keygen -pass 1234 -out sm2.pem -pubout sm2pub.pem

$ echo hello | gmssl sm2encrypt -pubkey sm2pub.pem -out sm2.der
$ gmssl sm2decrypt -key sm2.pem -pass 1234 -in sm2.der

$ gmssl sm2keygen -pass 1234 -out rootcakey.pem
$ gmssl certgen -C CN -ST Beijing -L Haidian -O PKU -OU CS -CN ROOTCA -days 3650 -key rootcakey.pem -pass 1234 -out rootcacert.pem -key_usage keyCertSign -key_usage cRLSign
$ gmssl certparse -in rootcacert.pem

How to Get Keys

The private key used for SM2 signing was provided to us, along with a passphrase for testing purposes. Of course, in production systems, the private key is generated and kept private. The file extension is .sm2; the first step was to make use of it.

It can be parsed with:

$ openssl asn1parse -in file.sm2

    0:d=0  hl=4 l= 802 cons: SEQUENCE
    4:d=1  hl=2 l=   1 prim: INTEGER           :01
    7:d=1  hl=2 l=  71 cons: SEQUENCE
    9:d=2  hl=2 l=  10 prim: OBJECT            :1.2.156.10197.6.1.4.2.1
   21:d=2  hl=2 l=   7 prim: OBJECT            :1.2.156.10197.1.104
   30:d=2  hl=2 l=  48 prim: OCTET STRING      [HEX DUMP]:8[redacted]7
   80:d=1  hl=4 l= 722 cons: SEQUENCE
   84:d=2  hl=2 l=  10 prim: OBJECT            :1.2.156.10197.6.1.4.2.1
   96:d=2  hl=4 l= 706 prim: OCTET STRING      [HEX DUMP]:308[redacted]249

The OID 1.2.156.10197.1.104 means SM4 Block Cipher. The OID 1.2.156.10197.6.1.4.2.1 simply means data.

.sm2 files are an ASN.1 structure encoded in DER and base64-ed. The ASN.1 structure contains (int, seq1, seq2). Seq1 contains the SM4-encrypted SM2 private key x. Seq2 contains the x509 cert of the corresponding SM2 public key (ECC coordinates (x,y) of the point X). From the private key x, it is also possible to get X=x•P.

The x509 certificate is signed by CFCA, and the signature algorithm 1.2.156.10197.1.501 means SM2 Signing with SM3.

How to Sign with SM2

Now that the private key x is known, it is possible to use it to sign the concatenation of parameters and return the PKCS7 format expected.

As a reminder, ECC Digital Signature Algorithm takes a random number k. This is why it is important to add a random generator to the signing function. It is also difficult to troubleshoot: signing the same message twice will provide different outputs.

The signature will return two integers, r and s, as defined previously.

The format returned is PKCS7, which is structured with ASN.1. The asn1js tool is perfect for reading and comparing ASN.1 structures. For maximum privacy, it should be cloned and used locally.

The ASN.1 structure of the signature will follow:

  • The algorithm used as hash, namely 1.2.156.10197.1.401 (sm3Hash)
  • The data that is signed, with OID 1.2.156.10197.6.1.4.2.1 (data)
  • A sequence of the x509 certificates corresponding to the private keys used to sign (we can sign with multiple keys)
  • A set of the digital signatures for all the keys/certificates signing. Each signature is a sequence of the corresponding certificate information (countryName, organizationName, commonName) and finally the two integer r and s, in hexadecimal representation

To generate such signature, the Golang equivalent is:

import (
	"math/big"
	"encoding/hex"
	"encoding/base64"
	"crypto/rand"
	"github.com/tjfoc/gmsm/sm2"
	"github.com/pgaulon/gmsm/x509" // modified PKCS7
)

[...]

	PRIVATE, _ := hex.DecodeString("somehexhere")
	PUBLICX, _ := hex.DecodeString("6de24a97f67c0c8424d993f42854f9003bde6997ed8726335f8d300c34be8321")
	PUBLICY, _ := hex.DecodeString("b177aeb12930141f02aed9f97b70b5a7c82a63d294787a15a6944b591ae74469")

	priv := new(sm2.PrivateKey)
	priv.D = new(big.Int).SetBytes(PRIVATE)
	priv.PublicKey.X = new(big.Int).SetBytes(PUBLICX)
	priv.PublicKey.Y = new(big.Int).SetBytes(PUBLICY)
	priv.PublicKey.Curve = sm2.P256Sm2()

	cert := getCertFromSM2(sm2CertPath) // utility to provision a x509 object from the .sm2 file data
	sign, _ := priv.Sign(rand.Reader, []byte(toSign), nil)
	signedData, _ := x509.NewSignedData([]byte(toSign))
	signerInfoConf := x509.SignerInfoConfig{}
	signedData.AddSigner(cert, priv, signerInfoConf, sign)
	pkcs7SignedBytes, _ := signedData.Finish()
	return base64.StdEncoding.EncodeToString(pkcs7SignedBytes)

Key Takeaways: Demystifying SM2 Cryptography

  1. SM2 relies on Elliptic Curve Cryptography (ECC): This advanced mathematical method provides superior security compared to traditional RSA algorithms.
  2. ECC keys are unique: The public key is a point reached by repeatedly adding the base point to itself a specific number of times. This number acts as the private key and remains secret.
  3. ECC signatures are dynamic: Unlike static signatures, ECC signatures use a random element, ensuring they vary even for the same message. Each signature consists of two unique values (r and s).
  4. Troubleshooting tools: ASN.1 issues can be tackled with asn1js, while Java problems can be identified using jdb and jd-gui.
  5. Cryptography requires expertise: Understanding and implementing cryptographic algorithms like SM2 demands specialized knowledge and careful attention.

References & Further Reading:

  1. Elliptic Curve Cryptography (ECC) 
  2. What is the math behind elliptic curve cryptography? | HackerNoon 
  3. Releases · guanzhi/GmSSL
Why Startups Need To Architect Cloud Agnostic Products

Why Startups Need To Architect Cloud Agnostic Products

Nobody plans to leave AWS in the startup world, but as they say, “sh** happens.”

An image of multiple clouds over a desk

As engineers, when we write software, we’re taught to keep it elegant by never depending directly on external systems. We write wrappers for external resources, we encapsulate data and behaviour and standardise functions with libraries. 

But, When it comes to the cloud… “eerie silence”

Companies have died because they needed to move off AWS or GCP but couldn’t do it in a reasonable and cost-effective timeline.

We (at Itilite) had a close call with GCP, which served as our brush with the fire. Google had arguably one of the best Distance Matrix capabilities out there.  It was used in one of our core logic and ML models. And on one fine Monday afternoon, I have to set up a meeting with my CEO to communicate that we will have to spend ~250% more on our cloud service bill in about 60 days.

Actually, google increased the pricing by 1400% and gave 60 days to rewrite, migrate, move out or perish!  

The closest competitor in terms of capability was DistanceMatrix and a reliable “Large” player was Bing. But, both left a lot for in the “Accuracy”. So, for us, the business decision was simple: make the entire product work in “Reduced Functionality” mode for all or start differential pricing for better accuracy!  In either case, those APIs must be rewritten with a new adaptor. 

It is not an enigma why we do this. It’s simple: there are no alternatives, there is no time to GTM,  But maybe there is. I’ll explain why you should take cloud-agnostic architecture seriously and then show you what I do to keep my projects cloud-agnostic.

Cloud Service Rationalisation

The prime reason you should consider the ability to switch clouds and cloud services is so you can choose to use the cloud service that is price and performance-optimized for your use case.

When I first got into serverless, we wrote a transformative API on Oracle Cloud (Bcoz we were part of their Accelerator Program and had a huge credit.) but it fed part of the data that the customer-facing API relied on.

No prize for guessing what happened?

It was a horrible mistake. Our API had an insane latency problem. Cold start requests added additional latency of at least 2 seconds per request. The AWS team has worked hard to build a service that can do things that GCP’s Cloud Functions simply can’t, specifically around cold starts and latency.

I had to move my infrastructure to a different service and a revised network topology.

Guess we would have learned the problem by now, but as we will find out, we did not.

This time it was a combination of Kafka and the AWS Lambda that created an issue. We had relied on Confluent’s connectors for much of the workload interfaces and had to shell out almost $1000 per month per connector!

Avoiding the Cloud Provider Killswitch

Protect Your Business from Unexpected Termination

As a CXO, you may not be aware that cloud providers like AWS, GCP, and Azure reserve the right to terminate your account and destroy your infrastructure at any time, effectively shutting down your business operations. While this may seem like an extreme measure, it’s important to understand that cloud providers have strict terms of service that can lead to account termination for a variety of reasons, even if you’re not engaged in illegal or harmful activities.

A Chilling Example

I recently spoke with a friend who is the founder of a fintech platform. He shared a chilling incident that highlights the risks of relying on cloud providers. His team was using GCP’s Cloud Run, a container service, to host their API. They had a unique use case that required them to call back to their own API to trigger additional work and keep the service active. Unfortunately, GCP monitors this type of behaviour and flags it as potential crypto-mining activity.

On an ordinary Sunday, their infrastructure vanished, and their account was locked. It took them six days of nonstop effort to migrate to AWS.

Protect Your Business

This incident serves as a stark reminder that any business operating on cloud infrastructure is vulnerable to unexpected termination. While you may not be intentionally engaging in activities that violate cloud provider terms of service, it’s crucial to build your infrastructure with the possibility of termination in mind.

Here are some key steps you can take to protect your business from the cloud provider killswitch:

  1. Read and understand the terms of service for each cloud provider you use.
  2. Choose a cloud provider that aligns with your industry and business model.
  3. Avoid relying on a single cloud provider.
  4. Have a backup plan in place.
  5. Regularly review your cloud usage and ensure compliance with cloud provider terms of service.

By taking these proactive measures, you can significantly reduce the risk of your business being disrupted by cloud provider termination and ensure the continuity of your operations.

Unleash the Power of Free Cloud Credits

For early-stage startups operating on a shoestring budget, free cloud credits can be a lifeline, shielding your runway from the scorching heat of cloud infrastructure costs. Acquiring these credits is a breeze, but the way most startups build their infrastructure – akin to an unbreakable blood oath with their cloud provider – restricts them to the credits granted by that single provider.

Why limit yourself to the generosity of one cloud provider when you could seamlessly switch between them to optimize your resource allocation? Imagine the possibilities:

  • AWS to GCP: Upon depleting your AWS credits, you could effortlessly migrate your infrastructure to GCP, taking advantage of their generous $200,000 credit offer.
  • Y Combinator: As a Y Combinator startup, you’re entitled to a staggering $150,000 in AWS credits and a mind-boggling $200,000 on GCP.
  • AI-Powered Startups: If you’re developing AI solutions, Azure welcomes you with open arms, offering $300,000 in free credits to fuel your AI models on their cloud.

By embracing cloud-agnostic architecture, you unlock the freedom to switch between cloud providers, potentially saving you a significant $200,000 upfront. Why constrain yourself to a single cloud provider when cloud-agnosticism empowers you to navigate the cloud landscape with flexibility and cost-efficiency?

Building Resilience: The Importance of Cloud Redundancy

In the ever-evolving world of technology, no system is immune to failure. Even industry giants like Silicon Valley Bank can outright disappear over a weekend or AWS’ main Datacenter can go offline due to a power fluctuation, highlighting the importance of proactively safeguarding your business operations.

Imagine the potential financial impact of a 12-hour outage on AWS for your company. The costs could be staggering, not only in lost revenue but also in reputational damage and customer dissatisfaction or even potential churn.

This is where cloud redundancy comes into play. By running parallel segments of your platform on multiple cloud providers, such as AWS and GCP, you’re essentially creating a fail-safe mechanism.

In the event of an outage on one cloud platform, the other can seamlessly pick up the slack, ensuring uninterrupted service for your customers and minimizing the impact on your business. Cloud redundancy is not just about disaster preparedness; it’s also about optimizing performance and scalability. By distributing your workload across multiple cloud providers, you can tap into the unique strengths and resources of each platform, maximizing efficiency and responsiveness.

In our case, we run the OCR packages, SAML, and Accounts service on Azure, our core “Recommendation engine” and “Booking Engine” on AWS. Yes, having a multi-cloud will involve initial costs that might be prohibitive, but in the long run, the benefits will far outweigh the costs.

Cloud Cost Negotiation: A Matter of Leverage

In the realm of business negotiations, the ultimate power lies in the ability to walk away. If the other party senses your lack of alternatives, they gain a significant advantage, effectively holding you hostage. Cloud cost negotiations are no exception.

Imagine you’ve built a substantial $10 million infrastructure on AWS, heavily reliant on their proprietary APIs like S3, Cognito, and SQS. In such a scenario, walking away from AWS becomes an unrealistic option. You’re essentially at their mercy, accepting whatever cloud costs they dictate.

While negotiating cloud costs may seem insignificant to a small company, for an organization with $10 million of AWS infrastructure, even a 3% discount translates into substantial savings.

To gain leverage in cloud cost negotiations, you need to establish a credible threat of walking away. This requires careful planning and strategic implementation of cloud-agnostic architecture, enabling you to seamlessly switch between cloud providers without disrupting your operations.

Cloud Agnosticism: Your Negotiating Edge

Cloud-agnostic architecture empowers you to:

  1. Diversify your infrastructure: Run your applications on multiple cloud platforms, reducing reliance on a single provider.
  2. Reduce switching costs: Design your infrastructure to minimize the effort and cost of migrating to a new cloud provider.
  3. Strengthen your negotiating position: Demonstrate to cloud providers that you have alternative options, giving you more bargaining power.

By embracing cloud-agnosticism, you transform from a captive customer to a savvy negotiator, capable of securing favorable cloud cost terms.

Unforeseen Challenges: The Importance of Cloud Agnosticism

In the dynamic world of business, unforeseen challenges (and opportunities) can arise at any moment. We often operate with limited visibility, unable to predict every possible scenario that could impact our success. Here’s an actual scenario that highlights the importance of cloud-agnostic architecture:

Acquisition Deal Goes Through

This happened with One of my previous organisations, we tirelessly built this company from the ground up. Our hard work and dedication paid off when a large SaaS Unicorn approached us with an acquisition proposal.

However, during the due diligence, a critical issue emerged: Our company’s infrastructure was entirely reliant on AWS. The Acquiring company had a multi-year multi-million dollar deal with Azure and the M&A team made it clear that unless our platform can operate on Azure, the deal is off the table!

Our team faced the daunting task of migrating the entire infrastructure to Azure within a limited timeframe and budget. Unfortunately, the complexities of the migration proved time-consuming and the merger took 5 months to complete and the offer was reduced by $2 million!

The Power of Cloud Agnosticism

This story serves as a stark reminder of the risks associated with a single-cloud strategy. Had our company embraced cloud-agnostic architecture, we would have possessed the flexibility to seamlessly switch between cloud providers, potentially leading to a bigger exit for all of us!

Cloud-agnostic architecture offers several benefits:

  • Reduced Vendor Lock-in: Avoids dependence on a single cloud provider, empowering you to switch to more favourable options based on your needs.
  • Improved Negotiation Power: Gains leverage in cloud cost negotiations by demonstrating the ability to switch providers.
  • Increased Resilience: Protects your business from disruptions caused by cloud provider outages or policy changes.
  • Enhanced Scalability: Enables seamless expansion of your infrastructure across multiple cloud platforms as your business grows.

Embrace Cloud Agnosticism for Business Continuity

In today’s ever-changing technological landscape, cloud-agnostic architecture is not just a benefit; it’s a necessity for businesses seeking long-term success and resilience. By adopting a cloud-agnostic approach, you empower your company to navigate the complexities of the cloud landscape with agility, adaptability, and cost-efficiency, ensuring that unforeseen challenges don’t derail your journey.

My Solution

Here’s what I do about it, now after the lessons learnt. I use Multy. Multy is an open-source tool that simplifies cloud infrastructure management by providing a cloud-agnostic API. This means that developers can define their infrastructure configurations once and deploy them to any cloud provider without having to worry about the specific syntax or nuances of each cloud platform. While Multy provides an abstraction layer for deploying cross-cloud environments, you will also need to incorporate cloud-environment agnostic libraries to really make a difference.

References & Further Reading: 

  1. https://kobedigital.com/google-maps-api-changes/
  2. https://www.reddit.com/r/geoguessr/comments/cslpja/causes_of_google_api_price_increase_suggestion/ 
  3. https://multy.dev/
  4. https://github.com/multycloud/multy
  5. https://github.com/serverless/multicloud 
  6. https://aws.amazon.com/startups/credits
Achieve Peak Performance: AI Tools for Developers to Unlock Their Potential

Achieve Peak Performance: AI Tools for Developers to Unlock Their Potential

You were scrolling through Twitter or your favourite SubReditt on the latest tech trend and a sudden feeling of FOMO creeps in. You’re not alone.

While the notion of a “10x developer” has traditionally been considered aspirational, the emergence of AI-powered tools is levelling the playing field, empowering developers to achieve remarkable productivity gains. While there might be 1000s of possible “AI tools”, I’ll restrict to tools which could yield a direct productivity boost to a developer’s day-to-day work as well as the outcome.

1. AI Pair-Developer / Code Assistants

Sourcegraph Cody & Github Copilot — Read, write and understand code

If you have used GitHub Copilot. Think of a Cody as a Turbocharger for Copilot. If you have not used Copilot, you should first try it. Either of these can understand your entire codebase, code graphs, and documentation and help you write efficient code, write unit tests, and document the codebase for you.

While the claim of a 10x speed increase is not substantiated, it shows clear intent to improve productivity drastically. However, it’s in beta, and the tool acknowledges that it’s not always correct, though they’re making rapid improvements. Yes, GitHub Copilot X is there — but then, your organisation needs to be on the Enterprise plan or you might have to add an additional $10-20 per user per month, and Cody is already here.

2. AI Code reviews – Offload the often mundane task of code reviews

While CodeRabbit and DeepCode (now acquired by Snyk) are some of the trailblazers in this space, I have not had the opportunity to work with either of them for any stretch of time. If you know about their relative strengths or benefits, please add a comment, and I will incorporate it.
The tool I use most regularly is called Robin-AI-Reviewer, from the good folks at Integral Healthcare (funded by Haystack). My reasoning is two-fold, It is open-source and if it is good enough for HIPPA-compliant app development and certification assessment, it’s a good starting point.

3. AI Test writing – Delegate the task of writing tests to AI- CodiumAI

CodiumAI serves as an AI test-writing assistant. It analyses your code, docstrings, and comments to suggest tests intelligently. CodiumAI addresses a critical aspect of software development that often consumes valuable time: testing. While numerous tools prioritize code writing and optimization, ensuring code functionality is equally vital. CodiumAI seamlessly fills this gap, and its intelligent test generation capability can substantially enhance development efficiency and maintain superior code quality.

4. AI Documentation Assistant — Get AI to write docs for you

This is a no-brainer, who loves writing code walkthroughs and docs? No? Didn’t think so! Mintlify serves as your team’s technical writer. It reads and interprets your code, turning it into a clear, readable document. By all accounts, it is a definite must.

Disclaimer: I have not personally used this and have been mostly able to get this done with Cody, itself. And then, I am no longer doing the primary documentation as my main responsibility.

5. AI Comment Assistant – Readable AI — Never write comments again

Readable AI automates the process of generating comments for your source code. It’s compatible with several popular IDEs, like VSCode, Visual Studio, IntelliJ, and PyCharm, and it can read most languages.

6. AI Tech Debt Assistant – Grit.io

Grit.io is an automated technical debt management tool. Its prime function is auto-generating pull requests that manage code migrations and dependency upgrades. Grit is in beta and available for free till beta moves to RC1. But it actually has about 50+ pattern libraries and it is growing.

I absolutely love it and Grit alleviates a significant portion of the manual work involved in managing migrations and dependency upgrades. They say it 10x’s the refactoring and migration process. I’d say at 33% of what they say, It will still be 300% of what productivity increases. And it is a considerable gain. If you’re an Engineering Leader and you have a “Budget” for 1 tool only, It should be this!

7. AI Pull Request Assistant – An “AI” powered DIff tool

What The Diff AI is an AI-powered code review tool. It writes pull request descriptions, scrutinises pull requests, identifies potential risks, and more. What The Diff claims to be able to significantly speed up development timelines and improve code quality in the long run. It could take a great deal of pain out of the process.

Disclaimer: I have not personally used this

8. AI-driven residential Wizard – Adrenaline AI — Explain it to me

Adrenaline AI helps you understand your codebase. The tool leverages static analysis, vector search, and advanced language models to clarify how features function and explain anything about it to you. The thing I like about this tool very much is, it can be leveraged to automate the “How tos” for your software engineering teams!

9. AI collaboration companion for software projects

Stepsize AI by Stepsize is an AI companion for software projects. It seamlessly integrates with tools like Slack, Jira, and GitHub, providing insightful overviews of your activities and offering strategic suggestions.

The tool uses a complex AI agent architecture, providing long-term “memory” and a deep understanding of the context of your projects.

10. AI-Driven Dev Metrics Collection – Hivel.ai

While strictly speaking, not an AI-driven “assistant” to an average developer, I feel it is nevertheless a good tool for the Engineering org and Engineering leaders to keep track and make course corrections. It provides a Cockpit/Dashboard of all the metrics that matter.

Hivel is built by an awesome team of devs and led by Sudheer

How to Manage Technical Debt in 2023: A Guide for Leadership

How to Manage Technical Debt in 2023: A Guide for Leadership

In this article, I will summarise effective strategies and best practices to tackle tech debt head-on.

Technical debt is an inevitable reality in software development. But, it can be leveraged just like a financial loan/debt can help you achieve your goals, if managed properly.
It can be used to drive competitive advantage by allowing companies to launch new products and features faster, experiment with new technologies, and improve the scalability and performance of their systems. However, like all loans, it need to be “Repaid” properly and at the right time, failing on it will create a downward spiral.

If you’re not careful, technical debt can quickly become a major burden that slows down development and makes it difficult to add new features or even fix bugs in a timely manner.

We will discuss how to identify technical debt and the signs of poorly managed debt, and then provide a strategy for reducing it. We will also discuss what a healthy level of technical debt looks like and how leaders can use it to their advantage.

Good Tech Debt Vs Bad Tech Debt

Robert Kiyosaki, the author of Rich Dad Poor Dad, famously said:

Bad debt takes money out of your pocket, while good debt puts money in your pocket.

– Robert Kiyosaki

The same is true of tech debt.

Technical debt is the cost of not doing things the right way the first time. Good technical debt is accrued when you make trade-offs to meet deadlines or deliver new features quickly. Bad technical debt is accrued when you make poor decisions or cut corners.

Bad tech debt will probably make your PMs, Sales and CEO happy for a quarter or two. But after that, they will be asking why everything is behind schedule and dealing with customer complaints because things aren’t working properly.

Now that I have presented the obvious in a familiar “Quadrant”, you can actually skip the terminologies and definitions part of this article! 😀

For my verbal brethren, Which is the Tech Debt you’d need to ruthlessly hunt down to extinction? Obviously, it is the untracked, undocumented ones. And the ones which are dragging your team on a downward spiral (immaterial of whether it is tracked or not)

Why does your Tech Debt keep accumulating?

Before we can think about building a strategy to solve tech debt, we need to understand how it gets out of control in the first place.

It’s called “impact visibility”.

Fixing code debt issues is impossible if:

1, You’ve no record of what technical debt issues you have

2, You’ve got a backlog, but you can’t see which issues are related to what code

In both cases, you can’t prioritise tech debt over shipping new features.

We need to get more granular about what impacts these two tech debt cases above.

  • Issue invisibility — There’s no source of shared knowledge. Codebase health info is locked in (few) engineers’ heads.
  • No code quality culture — Shipping fast, whatever the cost, like it’s going out of fashion.
  • Poor process — Tech debt work sucks. Nobody likes creating Jira tickets. “Jira” has become a dirty word.
  • Low-time investment — Justifying the time to fix tech debt or to refactor is a constant uphill battle. After a point, engineers become silent!

Lack of context — Issues in Jira are a world away from the hard reality of the codebase. They’re not related in any way.

So what’s the source of this? Let’s talk strategy.

Spoiler… It’s about changing organisational culture and developer behaviour to track issues properly.

Creating a strategy to reduce technical debt

Track. Issues. Properly.

Good tech debt management starts with team-wide excellence at tracking issues.

You can’t have a tech debt strategy without tracking.

The engineering leader’s job is to make that “issue tracking” easy for your team. There is supposed to be a software for that – Jira, Asana, Rally or something of that sort.

The problem is, I’ve never believed they really get to the bottom of the problem, and after speaking with scores of engineers and leaders about it, they usually don’t either. My personal belief is most companies suffer on the velocity after their Jira rollout! It is a bit like,

No two countries that both have a McDonald’s have ever fought a war against each other.

Thomas L. Friedman – in The Lexus and the Olive Tree!

As a leader, You need to find a way to…

  • Show engineers when they’re working on code with tech debt, without them having to jump thru 3 hoops.
  • Make it really easy for team members to report tech debt.
  • Create a natural way to discuss codebase issues.
  • Integrate tech debt work into your workflows and involve PMs if required.

There are multiple ways to achieve this, the easiest is to not address it. Ie: not address it intentionally, just tweak your existing pipeline. This can be done by,

  • A very robust linting & integration to the IDE
  • Tighter Git rules for commits
  • SAST which runs on the pipeline
  • and can feed into the IDE

Prioritising impactful tech debt

At this point, it should be obvious, but prioritising the right issues is only possible if you’re tracking the impact of these “issues” and it could be direct or indirect (Dependency, Sequencing, Rework avoidance etc) .

Once you’ve got them, you should regularly and consistently use them to decide what to address. This usually happens during the backlog grooming or sprint planning sessions. But, this decision-making process needs to be strategic. Not at all tactical, ie: DO NOT delegate it to the whims and whimsicals of your TL/PM or even EM.

You or someone with a context of the organisation and position on sales, clients, revenue etc., should be doing this.

A good way to start is by choosing a theme each time you prioritise issues. For example, you could prioritise issues that…

  • Are impacting a specific feature you need to work on in the next quarter
  • Are impacting the customer’s UX
  • Are affecting efficiency/morale on the team
  • Are impacting the security posture

This is often straightforward if you’ve got high-quality issues that traceable to code and tagged as such.

Most people wonder how to get the time for these “Tasks”. I have two recommendations.

  • Take an entire sprint every quarter to repay the tech debt (Will need high-level buy-in, It is slightly harder to align your CXOs)
  • Allocate 15-20% of bandwidth in every sprint. (Easier to achieve buy-in from CXOs, harder to drive with engineers)

Engineers generally won’t prioritise tech debt work by themselves because of the conflict of interest/pressure of shipping fast. This was evident from multiple high velocity/impact software engineering teams including ones at AirBnb, Netflix and Spotify. A commitment to code refactoring and maintenance work should be endorsed and supported from the top and reinforced regularly.

How much Tech Debt can you take on?

Managing technical debt is like managing financial debt. You can use it to your advantage, but you need to be careful not to let it get out of control.

Your technical debt budget is the amount of technical debt that you are willing to take on in order to achieve your business goals. You should not try to solve all of your technical debt at once, but instead focus on the most important items.

Prudent technical debt is debt that you take on deliberately and knowingly, in order to achieve a specific goal. For example, you might take on technical debt to launch a new product quickly, or to add a new feature that is in high demand by your customers.

If you manage your technical debt properly, it can be a powerful tool for gaining a competitive advantage. However, if you let your technical debt get out of control, it can lead to serious problems, such as increased costs, delays, and security vulnerabilities.

Concluding remarks:

Technical debt is one of the most neglected areas of software development. It is often only given priority when it is too late and has already caused serious problems.

However, when leaders work together and develop a consistent and process-driven strategy, technical debt can be effectively managed.

The best engineering teams are constantly thinking about how to use their technical debt budget to their advantage.

References and Further Reading:

No McKinsey, You got it all wrong about developer productivity!

No McKinsey, You got it all wrong about developer productivity!

Disclaimer: I have been an enormous proponent of Developer Productivity and have tried to implement automated metrics collection in 3 orgs with varied success. In my Mentoring sessions with early-stage startup leaders as well, I (re)enforce the importance of being aware of Dev Productivity. So much so, that I have written a 2-part article on the same here, here and here. I have also been a huge fan of McKinsey and how they seem to get answers which eluded the attention and resources of mega-corporations or governments alike. However, this article is written to communicate an entirely different perspective. In my opinion, McKinsey has got this entire “framework” thing about “dev productivity” wrong.

Introduction:

About a month back, McKinsey published an article claiming that they have developed a framework to measure productivity. They also acknowledged the fact that they were simply rehashing some of the existing metrics (like DORA and SPACE), which were used by Engineering Leaders and have simplified it (without the context) and are pitching it to their traditional buyers, the C-Suite executives in Mega corporations. Actually, some of these metrics can be useful tools if used correctly -One example is Hand-offs. But, the main reason I have chosen to write this article is their central focus seems to be “Coders should code”. It also appears to have A) missed the context of every metric, OR B) Omitted the context so as not to burden their target audience.

Finally, there is a mix-max of things to track, metrics to monitor and Opportunities to Focus, which looks like

Captain Ramius Pointing to a young Jack Ryan that Admiral Halsey was reckless!

Captain Ramius Pointing to a young Jack Ryan that Admiral Halsey was Stupid!

The Legendary Kent Beck has written a deep 2-part piece on countering the conjectures presented by McKinsey and elaborating on the gaps that engineering orgs are traditionally bound to manifest. It is very well written and covers almost everything. There are also a bunch of other eminent Software Engineers who have written on this and I have tried to give a quick lot at the bottom of this article.

What Was I concerned about?

Focus On Activities

I was primarily concerned about the lack of focus on Outcomes and Impact and a focus on the “Activities” in the proposed framework!

Any engineering leader or manager will tell you that Code Review Velocity and Deployment Frequency have nothing to do with measuring outcomes. While I will not discount Cycle Time or MTTR (I take pride in building multiple teams with one of the lowest MTTR and Cycle times in the ecosystem). They are indicators of some process elements/activities that could lead to outcomes. If we want to measure something, it should be Outcomes, not activities!

Focus on Optimisation of Irrelevant Metrics

Code Review Velocity:

If you want to time-motion the code review process in the entire stream map, you’ll find that async code review is killing your productivity. Pairing improves that dramatically. Instead of trying to sub-optimize for code review, measure the thing we actually want to improve. Which will be “Cycel Time”.

Story Points Completed:

Let’s agree on a basic fact. A “story point” is a made-up number. It was conceived as yet another way to obfuscate estimates for thought work that is difficult to estimate. As originally conceived, it represented the number of mythical “ideal days” of effort. There’s so much time wasted on getting better at “story pointing,” arguing about the Fibonacci sequence, “planning poker,” and other story point nonsense. Frankly, it is one of the “Bad” elements of Scrum! As a leader, you should find and remove handoffs and wait times. Story points are useless for anything and even more useless for this goal. Track throughput instead. 

Handoffs:

This is a good one. Good job, McKinsey. You got something right. Stop using testing teams, use pairing instead of code review, operate what you build, and don’t have any people doing anything manual to the right of development.

Contribution Analysis and Opportunities focus

In the other focus areas, they have listed metrics at the individual level that can be useful unless you measure “developer satisfaction,” “retention,” and “interruptions” at the individual level. These should only be measured in aggregates to prevent any cognitive bias. IMO, Things start getting really toxic in the “Opportunities focus” section, though.

I have been part of organisations and processes where there was a focus on tracking and measuring the outcomes of individuals. It did not play out well, ever. My Conclusion after reading the article for the second time is that McKinsey thinks their intended audience (CEOs and CFOs) cannot understand “systems thinking.” Now, If you roll out this or a similar framework and announce this and what do you think will happen?

You have a group of people all working on the same backlog but not acting as a team. Code review suffers, mentoring sufferers, pairing is hard, work breakdown suffers, etc. Anything that requires more than one person to conduct/conclude, including helping someone get unstuck, will get deprioritised!

Overall, The inferences seem to be based on hard facts, but the conjectures are all flawed.

Why This Now?

At this point, I want to highlight what “Triggered” me to write this, read the following.,

For example, one company found that its most talented developers were spending excessive time on noncoding activities such as design sessions or managing interdependencies across teams. In response, the company changed its operating model and clarified roles and responsibilities to enable those highest-value developers to do what they do best: code.

McKinsey’s Article on the purported Framework

Wow. I pray for that company.

So, I believe after McKinsey pointed to the fact, that developers are involved in irrelevant things like design, architecture etc. They created separate towers of responsibility for design. In that case, I am puzzled about who will be responsible for the minor things like dependency management, prerequisites, versioning, capacity planning, concurrency, scalability etc.

Did they get anything Right?

Yes. There are tonnes, but they are buried at the bottom. Their focus on Hand-offs and cycle times are really worth tracking in any engineering org. To the authors’ credit, they have also identified some of the core issues with measuring Developer Productivity. But, someone higher in the firm seem to have suggested to soften the blow. So, they have diluted and buried those sections. I will share 2 gems here.

To truly benefit from measuring productivity, leaders and developers alike need to move past the outdated notion that leaders “cannot” understand the intricacies of software engineering, or that engineering is too complex to measure.

The real problem is that in many large organisations, “The Management” doesn’t understand the work they manage. Management can understand the intricacies of software engineering if they become leaders and study the work they manage. In a large behemoth, not all managers are leaders. They want a framework and will enforce it with an iron fist. Now, McKinsey has delivered them a framework!

Learn the basics. All C-suite leaders who are not engineers or who have been in management for a long time will need a primer on the software development process and how it is evolving.

This one Nailed it! The primary reason “Management” finds it difficult to measure the right thing is because they sometimes do not understand the work they want to measure. Leaders who understand do measure the right things. My primary concern with this framework is, in trying to solve this, McKinsey has made the problem worse!

Just google “McKinsey developer productivity” and you’ll find more articles on how this framework is flawed than the original article link!

Anto’s Response to the Article and the purported Framework.

References & Further Reading/Watching:

1, Mc.Kinsey Article – https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/yes-you-can-measure-software-developer-productivity
2, Kent Beck’s rebuttal – https://newsletter.pragmaticengineer.com/p/measuring-developer-productivity
3, Redidit – https://www.reddit.com/r/programming/comments/1650595/measuring_developer_productivity_a_response_to/
4, Level Up Coding – https://levelup.gitconnected.com/the-developers-productivity-can-t-be-measured-in-mckinsey-s-way-an-analysis-4d81924279ae
5, Measuring Developers Productivity… McKinsey what’s the point? – https://www.youtube.com/watch?v=wjQn8nnkXTs
6, Can We Measure Developer Productivity? A Reaction to McKinsey’s Article – https://www.youtube.com/watch?v=ETa24ErdcwQ
7, HOW TO MEASURE ENGINEERING PRODUCTIVITY? – https://nocturnalknight.co/2022/11/how-to-measure-engineering-productivity/
8, Business Value delivery by Engineering Teams in StartUps – Part 1 – https://nocturnalknight.co/2021/10/business-value-delivery-by-engineering-teams-in-startups-part-1/#comment-773
9, Business Value Delivery by Engineering Teams in StartUps – Part 2 – https://nocturnalknight.co/2021/10/business-value-delivery-by-engineering-teams-in-startups-part-2/
10, Space Metrics – https://www.harness.io/blog/space-metrics-get-started
11, DORA Metrics – https://www.leanix.net/en/wiki/vsm/dora-metrics
12, Dave Farley’s Response To The NONSENSE McKinsey Article On Developer Productivity – https://www.youtube.com/watch?v=yuUBZ1pByzM

Onam: When the Gods grew Jealous of a Righteous King “Mahabali” and Banished him to the Underworld.

Onam: When the Gods grew Jealous of a Righteous King “Mahabali” and Banished him to the Underworld.

Disclaimer: This is unlike any of my previous articles and is a commentary/observation of certain cultural and mythological traditions and practices and My attempt to uncover the root elements of these traditions to a possible Historical event(s)”. The intention is not to hurt any sentiments. But if it does, maybe one of us needs to read more 😀

For the uninitiated, There is a South Indian festival called Onam which is celebrated with much pomp and merriment, Majorly in the Southern State of Kerala and some adjourning areas (of Tamil Nadu and Karnataka). What is most intriguing about this festival (as compared to other Indian festivals) is, that there is no clear Good and Evil in this myth. Or should I say, the Gods in this myth do not possess all the Good virtues. While the recurring theme in the Vishnu avatars is vanquishing the so-called “Evil” Asura in support of the celestial King Indira. And interestingly enough, all of the Asuras start out as mostly righteous kings who do their penance to Lord Bhrama or Shiva and get their Boons and eventually grew powerful enough to become a threat to Indra – The divine king with a thousand vaginas.

In some of those cases, there could have been enough bias for action (as per the narratives), whereas, in the case of one of the avatars the Vamana, Lord Vishnu comes and Vanquishes his own devotee in Mahabali 😕 And frankly there is not much malice, bad behaviours attributable to the Great Mahabali!

It is the singular story where it is clearly admitted that the “Gods grew Jealous of Mahabali” and requested Vishnu to control him. Hence the story goes something like this, Vamana came, and asked for 3 parcels of land from the great king, The king’s advisor (Shukra) resisted, but the benevolent king agreed. Vamana became so large and measured the Heavens in one step, and the earth in another and asked the king where to keep his feet for the third feet of land, Mahabali offered his head and thus Vamana banished Mahabali into the underworld (in order to measure the third step).

But suddenly God became all remorseful for his deeds and offered Mahabali one boon and Mahabali chose to visit his people once a year, thus we have Onam which celebrates the Return of the King from the underworld to his beloved kingdom.

Now, let us get into the possible origin story of this myth. On the surface, it looks like a myth of a God vanquishing an Asura who had grown all too powerful. However, there is enough linguistic evidence that suggests the purported rule of Mahabali/Maveli to to one of Righteousness and justfulness. For instance, there is a very famous “Onam Song” which captures the rule of Mahabali, I have given the Malayalam version and a Google translation Below.

മാവേലി നാട് വാണിടും കാലം
മാനുഷ്യരെല്ലാരുമൊന്ന് പോലെ
ആമോദത്തോടെ വസിക്കും കാലം
ആപത്തങ്ങാര്‍കുമൊട്ടില്ല താനും

മാവേലി നാട് വാണിടും കാലം
മാനുഷ്യരെല്ലാരുമൊന്ന് പോലെ
ആമോദത്തോടെ വസിക്കും കാലം
ആപത്തങ്ങാര്‍കുമൊട്ടില്ല താനും

ആധികള്‍ വ്യാധികള്‍ ഒന്നുമില്ല
ബാലമരണങ്ങള്‍ കേള്‍ക്കാനില്ല
ദുഷ്ടരെ കണ്‍കൊണ്ട് കാണ്മാനില്ല
നല്ലവരല്ലാതെ ഇല്ല പാരില്‍… ഇല്ല പാരില്‍

മാവേലി നാട് വാണിടും കാലം
മാനുഷ്യരെല്ലാരുമൊന്ന് പോലെ
ആമോദത്തോടെ വസിക്കും കാലം
ആപത്തങ്ങാര്‍കുമൊട്ടില്ല താനും

കള്ളവുമില്ല ചതിയുമില്ല
എള്ളോളമില്ല പൊളിവചനം
വെള്ളികോലാദികള്‍ നാഴികളും
എല്ലാം കണക്കിന് തുല്യമായി… തുല്യമായി

മാവേലി നാട് വാണിടും കാലം
മാനുഷ്യരെല്ലാരുമൊന്ന് പോലെ
ആമോദത്തോടെ വസിക്കും കാലം
ആപത്തങ്ങാര്‍കുമൊട്ടില്ല താനും

കള്ളപ്പറയും ചെറുനാഴിയും
കള്ളത്തരങ്ങള്‍ മറ്റൊന്നുമില്ല
കള്ളവുമില്ല ചതിയുമില്ല
എള്ളോളമില്ല പൊളിവചനം…പൊളിവചനം

മാവേലി നാട് വാണിടും കാലം
മാനുഷ്യരെല്ലാരുമൊന്ന് പോലെ
ആമോദത്തോടെ വസിക്കും കാലം
ആപത്തങ്ങാര്‍കുമൊട്ടില്ല താനും

And the Google translation goes like this. (with two minor contextual changes by me)

Time to relive Maveli Nadu
Like all humans
Time to live happily
He is not in danger

There are no diseases
Child deaths are unheard of
The wicked are not seen with the eye
No peril except the good ones… no peril

Time to relive Maveli Nadu
Like all humans
Time to live happily
He is not in danger

No lies, no cheating
No nonsense
Silversmiths and Nazhis (Barbers)
Everything equals… equals

Time to relive Maveli Nadu
Like all humans
Time to live happily
He is not in danger

Tollapara and Cherunazhi
There are no fakes
No lies, no cheating
There is no nonsense…nonsense

Time to relive Maveli Nadu
Like all humans
Time to live happily
He is not in danger

 By all accounts, the Kingdom of Mahabali was a Welfare State. Bali’s welfare state was beneficial to everyone. So, how come a king’s reign was so good and he could still be vilified? One cannot be, hence Mahabali was never shown in a bad light in any scriptures. Just for context, Mahabali was the Grandson of Prahalada, who was the son of Hiranyakashipu. Prahalada was the one who famously “Converted” to Vaishnavism and Lord Narashima came to protect him and slain his father Hiranyakashibu. And Mahabali was depicted as a devout Vaishnavite.

Modern Interpretations:

There is a growing sentiment and appropriation from left-wing and social justice movements that Mahabali was a casualty of the victory of the Vaidheka or Varna System.

According to Jotiba Phule, the Varna system began to be established after the defeat of Bali and Jotiba writes about King Bali and Vamana in his book Gulamgiri (Slavery). [Read – Onam – What Jotiba Phule Said About King Bali and Vamana] Hence, it could be said that Onam also marks the beginning of the caste system and slavery in Kerala.

Further, according to Jotiba Phule, Bali was killed on the battlefield but his son Banasura defeated Vamana and Bali’s subjects celebrated the victory. To this day, women in Maharashtra remember Bali’s egalitarian rule and say,

“Ida Peeda Jao, Bali Raj Yeo” (May our troubles and sorrows go, and Bali’s rule return).

Depection and Cultural erasure.

If you’re from my age group or younger you’d have seen Mahabali as a Jolly fellow with a pot belly and an Umbrella walking down the street as the symbol of the Onam festival.(sometimes with a Poonul/Janeu) whether in one of the several hundred Satellite TVs or Corporate Cultural events or in shopping malls etc.

But, from the myth of the Mahabali, 1, He is an Asura King 2, He was unvanquished 3, He won all the battles he fought and became so powerful that Indra feared him. He would probably have to be depicted as a person of 6’1” or 6’2″ with a chiselled chest and six packs of abs with a full coat of armour. This is how Raja Ravi Varma has depicted it in his painting. (below)

Original from Raja Ravi Varma Press – https://artsandculture.google.com/asset/vaman-ravi-varma-press/ogGBXvH38gD26w?hl=en

The reverse is also happening, some elements of the RSS/Parivar have been promoting Onam as a Vamana Jayanthi and are tactfully erasing the Identity of the Great King Mahabali.

I personally do not have a problem with RSS celebrating Onam and calling it Vamana Jayanthi as long as people do not lose their heritage or forget their history. But a very disconnected “Hindized??” (do not know an equivalent word for Hollywood “Whitewashing”, but we seriously need one) version is the above image. It is clearly written in the renderings that Vamana did not sport a tuft but a very large top-knot with flowing hair and his complexion was said to be dark.

More like this.

Summary:

I seem to believe that this is a manifestation of one of those things that is unique to the Indian culture and tradition. Gods do falter and can behave preferentially if required. They can be remorseful or cruel when it suits them. Think of the following,

  • Rama had to kill B/Vaali by shooting from a hidden spot while he was engaged in one-on-one combat with his brother.
  • Rama had to kill Shambuka as he was doing penance while he was not accorded that privilege
  • Krishna had to do so many unpleasant things that would be hard to list here, but chief among them was instigating Kunti to ask for a promise from Karna to not kill anyone except Arjun and then Indra to ask for Kavach and Kundal and finally he masqueraded as a Brahmin and sought the benefits of all his Dharma!

As a closing remark, wanted to remind this often misquoted phrase –

Those who cannot remember the past are condemned to repeat it.

George Santayana

References & Further Reading

  • Asuras were the old Pre-Vedic Gods: Alain Daniélou (1991). The Myths and Gods of India: The Classic Work on Hindu Polytheism from the Princeton Bollingen Seriespp. 141–142. Inner Traditions / Bear 
  • Will the real Mahabali please turn up this time? – https://timesofindia.indiatimes.com/city/kochi/will-the-real-mahabali-please-turn-up-this-time/articleshow/71048573.cms
  • How Indra got Thousand Yonis – https://detechter.com/how-lord-indras-body-covered-with-vaginas-later-became-eyes/
  • Onam – Remembering The Murder of “Asura” King Mahabali – https://velivada.com/2019/09/11/onam-remembering-asura-king-mahabali/
  • Onam – What Jotiba Phule Said About King Bali and Vamana http://velivada.com/2019/09/11/jotiba-phule-king-bali-vamana-onam/
  • The false portrayal of Kerala king Mahabali like an Indian version of Santa Claus – https://www.thenewsminute.com/article/false-portrayal-kerala-king-mahabali-indian-version-santa-claus-154210
  • Who is King Mahabali? – https://www.thehindu.com/entertainment/art/who-is-king-mahabali-great-warrior-or-pot-bellied-king/article32465774.ece
  • In a Battle Fought in Mythology, The RSS Attempts to Rewrite the Tale of Onam; and What Everyone Else Has to Say – https://caravanmagazine.in/vantage/mythology-rss-rewrite-onam-kerala
If You Can Do Your Job From Home, Be Scared. Be Very Scared!

If You Can Do Your Job From Home, Be Scared. Be Very Scared!

Sometimes, It seems like most people have added “Remote Work” to a long list of taboo topics that no one should discuss at work. Topics like Religion, Politics, Sexual Orientation, Medical Issues, etc.

Most people know that remote work is a horrible idea for organisations of any size, but they are afraid to call it out because they don’t want to appear out of touch with their employees/colleagues. Others are afraid to be viewed as hostile figures who wish to create tension with employees or be perceived as a manager who doesn’t trust their employees.

So, employers stay silent, and now everyone thinks remote work is the best business idea of the last 200 years.

Most of these shenanigans use personal anecdotes to defend the benefit of remote work. They say they are more productive at home, with fewer distractions and more time for daily (personal) chores. I do not agree or disagree with that statement completely, as it could be dependent on specific roles.

When Remote Work Could be Beneficial:

If you are an individual contributor and mostly work without the need to collaborate in real time, you can be productive in remote work. For example, If you are a finance analyst who dives through ledgers, borough thru tonnes of CSVs and crunch numbers all day, you may be more productive in Home.

When Remote work could be less than Optimal

If you are a creative or knowledge worker (Designer/Developer), chances are you’d NEED to collaborate with, break down/delegate, get feedback etc. In these roles, it will be almost impossible to get a calendar from 6 different people to drive consensus. Whereas if all of your stakeholders are in the office, it is a mere “Shout” or “Wave” and 3 mins conversation following it.

Google has officially changed its mind about remote work

Google leadership publicly admitted that remote work no longer works for them, and that’s the reason they want all of their employees back behind their desks.

Last week, Fiona Cicconi, Google’s chief people officer, wrote an email to the entire company stating, “Going forward, we’ll consider new remote work requests by exception only.” This is horrible news for employees and some companies who believe that remote work is the greatest idea since the invention of the internet.

Let me declutter the above statement for you.

Google is the biggest tech company in the world that created 100s of resources and tools to enable employees to work remotely, admitting defeat.

Despite the release of many communications tools that enabled workers across the globe and all industries to work remotely, it is finally saying that remote work doesn’t work. Let that sink in. Google’s remote employees are unhappy, but Google’s leadership rarely pay attention to the feeling of their employees (after Larry left Alphabet inc). They only pay attention to the stock price and what is recommended at the shareholder’s meeting.

Google is not alone. Apple, Microsoft, Facebook, and Amazon also laid off most remote employees.

Remote work destroyed the most profitable industry

As you have seen above, Google leadership is no longer willing to offer their employees 100% remote jobs. They want their employees in the office and productive.

You might ask, why is this happening?

It is happening because you witnessed the destruction of the Tech industry in the last two years. The tech industry crumbled because most Leaders were afraid to tell their remote employees to come to the office, so they did the next best thing: they lost money and laid remote employees off (mostly).

According to the Layoffs tracker, more than 200,000 people were laid off in 2023, and over 164,000 employees were laid off in 2022. This is a clear message to anyone who works in the tech industry, stop working remotely or start interviewing soon.

This is exactly what the most innovative CEO in the world did when he bought Twitter.

When Elon Musk bought Twitter, he found the company in tatters. Musk gave them the same two options he gave his Tesla employees, “If you do not return to the office, you cannot remain at the company. End of story.”

That was the end of the story for many of Twitter’s employees, Musk fired anyone who refused to show up at the office, and his company is more productive and innovative than ever.

Musk ended remote work at Twitter, and most people hated him.

Not every executive had the guts to do what Elon Musk did, but now most wish they did because every company needs to lay off their unproductive employees.

  1. Remote employees are less visible. When it comes to your value to your company, you have to be visible. If you think visibility is not important, ask Kayne West.
  2. Remote employees don’t have a strong connection to other employees or their companies. Relationships are extremely important. I established my professional credibility and earned my colleagues’ respect by connecting with them face-to-face, not through a computer screen.
  3. Remote employees are less invested in their work or career. Since most employer link visibility and ability together, I understand why more than 3000 HR managers believe that.

These reasons led Google, Microsoft, Amazon, Facebook, and Zoom to lay off their remote employees first. This is a horrible trend for people who want to work remotely, especially since most remote workers say it could take more than six months to find a new job

In conclusion, I’d like to recommend Richard Baldwin’s video

Is the myth of a “10X Developer” Real?

Is the myth of a “10X Developer” Real?

If you’re a software engineer, manager or leader, I am sure you have heard the term ‘10x developer’ used in discussions. It refers to developers who are purportedly 10 times more productive, or capable, than their peers, while it is a hotly contested category. Some refer to it very liberally, others deny that it even exists. In the last 40+ years, the ‘10X developer` has become a ‘Loch Ness` of the tech world, fueled by the hype associated with Silicon Valley.

I’m not about to delude myself into thinking that writing a blog on it to pass my verdict will put these theories to rest, but the question has gained enough traction that it deserves a little articulation. 

Do 10x developers really exist, and if so, how would we distinguish them?

Framing the Issue

All of us can acknowledge that the range of skills in most human activities can be extensive. A marathon runner can cover roughly 10 times the distance that an untrained person could, while a professional chef can cook a 5-course meal in 1/5th the time it takes an average person to do a 2-course meal.

Coding, which is a hugely complex field unencumbered by physical limitations, should naturally show differences in the skill that vary by orders of magnitude. Thus, if by 10x developer we simply mean a person whose skill level is in a different league compared to someone else, then clearly they exist. 

How could anyone argue otherwise?

Here’s the rub though, in the data-driven and lexically precise world of modern tech, that’s not what 10x developer means. Instead, a 10x developer is supposed to be someone who genuinely outperforms others by 10 times or more on some quantifiable scale. That ‘quantifiable scale’ is where the problems start.

Where did the term actually come from? Enter Coding War Games.

Coding War Games

Tom DeMarco and Tim Lister have conducted the “Coding War Games” since 1977. This is a public productivity survey in which teams of software implementors from different organizations compete to complete a series of benchmarks in minimal time with minimal defects. They’ve had over 600 developers participate. Its results are publicly available and is very informative, to say the least. Jeff Lester published a wonderful piece on the origins of the 10X developer here

The top findings from these are,

1, Get your working environment Right

The overriding observation from this study is that quiet, private, dedicated working space with fewer interruptions led to groups that performed significantly better.

2, Remove the Net Negative Producing Programmer

Some developers are “net negative producing programmers” (NNPP), that is they produce so many defects that removing them from the team increases productivity. This is the opposite of a 10X developer, these people are the ones that make the team productivity go from bad to worse.

The Problem with Measuring ‘Skill’

Even where skill can vary wildly, differences will not necessarily be quantifiable. A talented artist may know how to create a painting that teleports you to a whole other world compared to an average artiste who can transport you to the scene.

But the question is, can you attach numbers to that painting’s beauty?

The work of a developer isn’t nearly as abstract, but not all of it can be reduced to metrics either, and definitely not the programming skill itself.

A less glamorous approach may be to judge a 10x dev not in terms of skill but in terms of productivity. Someone who can write 500 lines of code when it takes others to write 50 would then fall in that category.

If you know anything about programming, however, you’ve probably already spotted the problem with this line of thinking. Longer code isn’t necessarily more efficient, and for most people there tends to be a positive correlation between how quickly one works and how many bugs one creates.

This is not to say that programmers can’t produce bug-free code much faster than their peers. Where this statement proves fallacious though, is in trying to peg that difference to a single metric. There are myriad factors at play that will affect a developer’s productivity outside of their skill, including their team and the environment they find themselves. In fact, depending on the situation, the 10x tag may be inaccurate because a developer could be programming well over 10 times as much as another and still produce 1/10th of the ‘Outcomes’!

What should be our conclusion? There can be no doubt that the field of programming has its own Mozarts and Vincent Van Goghs, and few would object if these people were described as being ‘orders of magnitude’ better than the rest. But it is important to recognize that this is only a figure of speech, and not something meant to be used according to its precise quantitative meaning.

I can’t presume to speak for the tech industry as a whole, but I for one have noticed a worrying tendency to read the expression ‘10x developer’ literally.

Ultimately this does more harm than good, as it spreads the myth that there is some universal metric whereby every programmer’s value can always be quantified. 

Important qualities like creativity, client focus and teamwork are entirely omitted in this way of thinking, which is why my final suggestion is to stop worrying about lofty 10x developers and whether you are, aren’t, or may or not become one. 

Simply focus on being the best developer you can be. That will always be enough.

References & Further Reading

Origin of a 10X developer

https://medium.com/ingeniouslysimple/the-origins-of-the-10x-developer-2e0177ecef60 

https://gwern.net/doc/cs/algorithm/2001-demarco-peopleware-whymeasureperformance.pdf

https://news.ycombinator.com/item?id=22349531

Bitnami