Quantum, Classic, Still Stupid Data

The need for a new type system.

The age of quantum is about to begin, like Sauron in The Lord of the Rings, who seeks to dominate Middle-earth and all its inhabitants. Its ascendance is around the corner, starting to establish a new world order, still in the depths of Mordor, yet ever-so magically manifesting itself on the heels of the Crypto Winter.

The connection between digital currency and quantum computing is rather obvious. If quantum can break wallets, blockchains, and current encryption methods, Sauron’s ambitions will be complete and havoc will be wreaked across Middle-earth. Such algorithms as Shor’s and Grover’s, not to mention super Grover, and the BHT algorithm, which uses something called the “birthday paradox,” all are coming into view, like the Nazgul. What’s more obvious is they’re coming into view simultaneously to digital currency making a comeback and hyperscalers offering quantum services.

Just look at the resurrection of Bitcoin, despite the turbulence of current markets, which has a whopping market cap of $1.6T – erasing any lingering memories of Mr. SBF and the FTX crash. At the same time, hyperscalers like Amazon and Google rent out quantum services to anyone who wants or can afford them. Amazon’s service is called Braket, Azure’s and IBM’s are called Quantum, Google’s quantum chip is called Willow, and all of them are trying to wrest power away from Nvidia’s dominance in GPUs and GenAI. As everyone builds their armies, it’s anyone’s guess who will win the battle for Middle-earth.

As Quantum prepares for dominance, I can’t help but think of a place not mentioned in its battle plans. Like the firey and lidless eye of Sauron overlooking Mordor, focused on his armies only to miss Frodo trekking to Mount Doom, quantum fails to see a member of the data community slowly creeping on the horizon. I’m referring to the rather liminal topic of type systems.

When I say “type systems,” I’m specifically talking about databases, but more precisely, I’m referring to schemas that, for most of us in the data industry, are pretty lame. To dip back into The Lord of the Rings theme (which I will continue to do!), as armies of Orcs and Men and Elves battle for Middle-earth, database schemas tend to their gardens in The Shire. Lame. Right?

However, these lame type systems do make an impact, and after learning about an approach to them by Protegrity, my thinking has drastically changed and compelled me to write this blog. I don’t want to get too “sales guy” on you, but I’ve learned that Protegrity offers a unique and powerful way to correlate behavior to data in databases, which I will call an “Uber Type.”

What is a Type System?

A type system, so states Adabeat.com, can be formally defined as a mechanism in programming languages designed to prevent certain types of errors by classifying values and expressions into types and ensuring that operations are used correctly according to these classifications. In databases, give or take, there are five or so data types: Integer, floating point, character, string, boolean, enumerated type, array, and date. Here’s a graph of data types in SQL-based databases:

So, a field in a database that represents “Last Name” is most likely a Char data type because it’s typically just letters. Even if your child is named Seven, or if you’re a Roman Emperor called Secundus or Tertius, or if you are Primo Levi, then you would fall under the classification of Char data type. The same thing holds for Social Security Numbers, in that an SSN is not a number, just like a peanut is not a nut. It’s a legume! (I apologize, but I just had to work the word “legume” into this blog. I got excited.) Anyway… An SSN is not a number but rather a set of characters, and thus nominally characterized as a Char data type. I say “nominally” because your SSN is higher than your father’s or mother’s Social Security Number, something important in fraud analysis and synthetic data generation, and thus sets up what I consider the lameness state of the type system.

An Example of a Stupid System

Years back, while I was the CEO of a company called C9, which helped companies analyze their sales pipelines, I remember getting a call from a customer, who was very upset: “The numbers don’t add up,” he said, and emphasized, “we have all 500 sales people coming on the line, including management, looking at the numbers and they’re wrong! This needs to be fixed immediately!” Even Sauron would be miffed.

After a tense hour or so, we found – somewhat gleefully – that the customer’s own Database Administrator had unknowingly deleted all sales data in the database for the 3rd QTR. Gone. A simple computed field called net-revenue, and its siblings, all typed integer – gone. Up in Smoke.

To me, the system, and how the data was created, should have prevented this from happening. Systems need to be smarter. An integer type can no longer be just 1 or 0 anymore, it has to offer shades of gray, with history and cadence, across a wide range of values, confronted by different users, with different roles, and different goals, like one person who is adding data and another analyzing data. There are also things like regulation, provenance, LLM training, and brand integrity. The type should have known that 0 revenue for Q3 is not possible – or at least signal to the user or application that the data is not to be trusted.

The only good news here is that someone caught the problem, and it was ultimately fixed, but what if it wasn’t? What if it wasn’t caught? What if an LLM was trained with this gaping hole?

Why is Protegrity More Than a Security System?

What I find so interesting about the Protegrity product is not just the fact that it protects and secures the data itself, but rather it characterizes and classifies the data – what data is sensitive, how best to protect it, where and how the data will be used, and by which users or machines – ascribing way more information and context than even semantic layers, let alone database schemas.

As James Rice – Protegrity’s Head of Product Marketing – said to me

Our product is actually trying to make sense of how the data will be used in the real world. Yet, cognizant that there are conflicting security, privacy, and business imperatives, compounded by different users, with different roles, with different goals, and further complicated by regulations, providence, training models, and brand integrity. But most importantly, by being embedded into the data itself, Protegrity applies this characterization and context enterprise-wide across systems like Oracle, Teradata, MicrosoftSQL, Snowflake, Databricks

As such, it is, in my mind, an Uber Type.

Taking this Uber Type Deeper: Encrypted Data Type

In the world of security, there always is a degree of dissidence between the data you have, the different uses of that data, and the way you want to protect that data. There is no silver bullet. Think of it this way… Sauron was a powerful Maia and could do just about anything, except defy the constraints given to him by Eru Ilúvatar. So, whether you mask data or encrypt it, such measures will have some impact, because the data is no longer free, out in the open, or used in the way typical optimizers can treat data in the clear. Protection, in turn, means constraints. In other words, masked or encrypted data requires special handling, and cannot be treated as standard data types, even when format-preserving algorithms are used (e.g., FPE and Tokenization). That’s the way it is. But what if there was another way to do it? What if there was a way to move beyond the constraints, and have the same power as the Valar?

Not knowing myself, I asked Yigal Rozenberg and Clyde Williamson , two key architects at Protegrity, whether there was a way to tag the data itself, almost like a “dynamic watermark” whereby the security or policy of data is inextricably part of the value itself, and thus change this battle between protected and usable data. What they came back with was a new way to think about the problem, and their solution was to extend the list of data types in a database, something they labeled, an “Encrypted Data Type.”

The benefits of this approach would go well beyond the optimization of search and sort operations, and add an “embedded context” to the data, not only by adding the property of “encrypted” but also by embedding such attributes as:

Identity of the data – e.g., date of birth vs dinner table reservation date
Classification – e.g., sensitive, private, secret
Origin – where the data is acquired, or data jurisdiction
Trust – is this data to be trusted (especially for downstream purposes, like LLM training)

There will be more blogs to describe all the different aspects of this new data type, and how it intends to be implemented, and proposed as a standard in such projects as Iceberg and PostgreSQL so it is broadly adopted. Let Sauron beware! The gathering of an army to fight the evil forces of Mordor is coming.

What does all of this have to do with quantum computing?

You would think that quantum technology would somehow evolve the notion of type systems, due to its innate distinctions over classical computing.

So, I reached out to Emlyn Hughes, Professor of Physics at Columbia University, who’s been teaching quantum mechanics for the last 30 years, and like many academics, he felt more study was necessary.

That said, the first thing he explained was quantum computing dangers – the ones that best represent the Nazgul:

Shor’s algorithm takes a large number and factors it into the two prime numbers. This is of great danger to the crypto world and can unravel RSA encryption.

Grover’s algorithm can search faster than a classical computer by sqrt(N) attempts. If it takes a classical computer a million attempts to decode something, it will take a quantum computer 1,000 attempts.

Emlyn is also potentially worried about a Super Grover algorithm, that improves Grover by the square root, as well as the BHT algorithm, which combines a Grover algorithm with a classical algorithm. This combination means that 1,000 quantum computer attempts will correspond to a trillion classical computer attempts!

“But,” he said, “all of this is not the point, at least not yet,” referring to data types, “because at first glance, nothing much has changed, except a slight name variation.” For example, an integer in quantum is “qint”. The Floating point is “qfloat”. Of course there’s the Qubit and the Pauli type, but in the end, Character is a qchar, and String is qstring.

In other words, quantum types are extensions of classical types, and none of these advances will do much to make systems smarter.

As Emlyn put it: As quantum computing becomes a reality and brings forward new levels of sophistication in computing speed, power, and security, type systems will need to be revamped, particularly to take into account strong correlations in stored information.

So, this story about a battle between good and evil, like that of The Lord of the Rings, can have a happy ending. It can lead to a better world. A safer world. But what’s the answer? The Uber Type – a Type for two – classical and quantum – an antidote to Sauron and an escape from Mordor!

-MH

Quantum, Classic, Still Stupid Data

By Michael Howard, CEO and Director, Protegrity

Apr 18, 2025

Summary

What is a Type System?

An Example of a Stupid System

Why is Protegrity More Than a Security System?

Taking this Uber Type Deeper: Encrypted Data Type

What does all of this have to do with quantum computing?

Recommended Next Read

Quantum, Classic, Still Stupid Data

By Michael Howard, CEO and Director, Protegrity

Apr 18, 2025

Summary

What is a Type System?

An Example of a Stupid System

Why is Protegrity More Than a Security System?

Taking this Uber Type Deeper: Encrypted Data Type

What does all of this have to do with quantum computing?

Fuel Innovation with New Content From Our Blog

Recommended Next Read