On November 9th, 2015 the Apache Cassandra project released version 3.0 and, with it, a host of really big changes you would expect in a major version. Over the past six years, there have been releases inside the 3.x branch but never a major version release. Today we feel a lot of pride in our community with the release of Cassandra 4.0. It has been a long time coming, but we hope you will see it was worth the wait.
Six years is a long time between major releases and many in our community have asked why it has taken so long. The shortest and easiest answer is that the project decided to become uncompromising on one important feature: quality. As the project was gaining traction in the early years, there was an unfortunate general rule about Cassandra: Wait until the x.6 version or six months before upgrading your production clusters to a new version. Software projects have already had the user rule to never use x.0 until bugs that were missed are found in production and an update is released. For an open source project driven by community, this seemed like something we can avoid and set a new standard.
To get the quality required, we took a completely different approach to verify data correctness in Cassandra. The scale that Cassandra clusters can reach means that there is an enormous surface area for potential bugs or data corruption, so we purpose-built new tools to cover every requirement.
-
Property-based / fuzz testing
-
Replay testing
-
Upgrade / diff testing
-
Performance testing
-
Fault injection
-
Unit/dtest coverage expansion
Over the past six years, those tools were perfected and deployed to help meet our quality goals. This sets an important baseline for any future version of Cassandra and provides the needed infrastructure to ensure future releases maintain a high level of quality and correctness; this has resulted in over 1,000 bugs being identified and fixed. Many only surfaced in the largest scale production workloads, which are notoriously hard to find.
Cassandra is used as the database of record for some of the most critical applications running in the world today. From finance to healthcare and everything in between, the data that is stored in Cassandra has to have the highest guarantees of correctness and quality. Knowing this, the Project Management Committee (PMC) set an almost impossibly high bar of quality that no other database has been held to. The policy agreed upon was simply stated: The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a state where major users would run it in production when it is cut. Consistent with that goal, Cassandra 4.0 is running in production today at Apple, DataStax, Instaclustr, Netflix, Orange, Pythian, Sky UK, and many more.