Decades ago, before smartphones, clouds, and connected devices became part of everyday life, data had a single home: the database. This central repository served as a starting point for any type of data analysis and business decision. Today, there are no boundaries. Data resides everywhere. It crosses company lines, streams in from connected devices, and even changes in shape and form as it travels across an ecosystem.
It’s no longer possible to protect an enterprise by locking down a single database. Today, data streams in from more sources than ever before. This includes IoT devices and mobile devices. In addition, many traditional systems capture data more frequently and very often in real time. As a result, companies must pull the data from these various sources and place everything into centralized data platforms—such as a data lake or data warehouse—that typically reside in the cloud. The goal is to drive business value on multiple business intelligence (BI), analytics, and artificial intelligence (AI) use cases.
Advances in data-centric technologies don’t mean security has gotten easier. Scientists can’t develop machine-learning models without data, but a growing focus on data privacy and security can introduce a formidable bottleneck for putting the data to work and launching use-cases that leverage both confidential and sensitive data.
In addition, businesses are increasingly exploring ways to share and pool data among a group of organizations for specific use cases. This makes it possible to share data within a common AI model and deliver benefits to everyone. This environment is forcing the business world to develop new AI methodologies that require privacy and security by design.
A New Era Emerges – AI on Protected Data
The introduction of far more advanced algorithms and computing frameworks has changed business in profound ways. It’s possible to gain insights that were once unimaginable. But with opportunity comes challenges. As organizations adopt more advanced algorithms and computation frameworks, conventional data-protection approaches are inadequate. We have reached a point where it’s necessary to ensure that every person feels safe with AI.
There’s a growing belief that AI should be used in ethical and trusted ways—and that the technology should be transparent and drive improvements for the world. As a result, a business must warrant that data adheres to strict standards and regulatory requirements. Consumers must believe that their personal data isn’t being misused or abused.
At first glance, building a strict framework for data privacy might seem like a way to heap red tape and bureaucracy onto AI and data-science practitioners. Already overloaded security and IT teams could perceive that this creates more work. However, the opposite is actually true. The ability to secure data means that an organization can innovate and operate faster and better. It’s actually freed from many onerous tasks.
For example, when an organization has access to protected data it can skip costly and time-consuming verification processes that would otherwise be necessary. It can confidently move forward with projects that involve trade secrets and sensitive customer records. It’s possible to begin a project without detailed discussions about the security and privacy concerns.
Secure AI Takes Shape
Building a more secure framework for AI starts with an understanding that there are two components of secure AI: secure data and secure algorithms.
Secure data aims to address regulation, privacy, trust, and ethics related to the data behind the decisions made by AI models. More important, secure data should be more accessible, something that today is a common bottleneck. There are several techniques that can be used in isolation or combination to promote this approach. This includes fine-grained tokenization, anonymization via differential privacy or k-anonymity, new types of cryptography such as homomorphic encryption, or the recent but rapidly evolving world of synthetic data. All of these techniques are important ingredients to keep the data secured while it’s being used for AI.
Secure algorithms aim to protect the intellectual propriety of a company that has used secure data to build a business use case, such as a machine-learning model. Algorithms typically have output coefficients that are connected in a mathematical form with their inputs, such as secure data. The same techniques applied to the data could be utilized to protect the coefficients, even if these patterns aren’t widely used by enterprises around the world.
In the last three years, major companies such as Google, Amazon, Microsoft, and Tesla have had their ML systems tricked, evaded, or misled. This trend is only set to grow. According to a Gartner report, 30 percent of cyberattacks by 2022 will involve the “poisoning” of AI training data, the theft of AI models, or AI examples made adversarial.
The security of data and AI is forcing the field to significantly evolve, revealing an emerging development: Decentralized AI. Imagine a group of banks that want to share data in order to tackle a common problem, such as credit risk or online fraud. Nevertheless, the banks might want to protect their underlying data—and prevent others from seeing it. Or it could involve a group of manufacturers looking to expand their data pool to improve maintenance and predict component failures. Typically, these companies wouldn’t share confidential data unless they are positive no one else sees the data.
Such an initiative is where decentralized AI makes sense. It exists in two forms: First, you can imagine a consortium fully encrypting its data before its moved to a central repository where a machine-learning model will be created, and the output will be shared with every participant. While this is a workable option, it still forces data to be moved to a central place.
A second form, the one I put more weight in, is federated learning. In this case, the model resides centrally but it moves to where the data is, trains locally, and only returns to the central repository to update after learning occurs on the specific dataset. This means the data doesn’t leave the source. This provides an extra level of comfort for each participant. Federated learning could become a way for data scientists and others to solve use cases that are locked, from the early detection of diseases through the sharing of data among multiple hospitals to the solving of financial crimes across several banks.
Putting Protection into Play
Ultimately, success involves more than simply ticking off boxes on a data-security and privacy checklist. It’s critical to develop a clear strategy along with a business plan for moving forward and implementing a secure AI framework. The first task is to define what techniques are more useful for secure data and secure algorithms and map them to different business use cases. This should provide velocity for both starting valuable use cases, but also to streamline their deployment. A second step is to determine whether decentralized AI needs to be part of the equation. Cross-border AI within a single company or group is often a candidate that can’t be ignored for decentralized AI.
Both require a cross-functional partnership that involves the AI function—typically the chief AI or data officer—along with the chief security officer and privacy officer. This builds a foundation for secure AI, including protecting data, using appropriate privacy techniques, and ensuring that models are compliant with regulations and that they align with an internal ethics department. The latter must help define what secure AI means to the organization. It also involves embedding key AI metrics and controls into the AI workflows and aligning them with privacy and security requirements.
This AI framework delivers full transparency around what data was used, which privacy level was applied to data, and which techniques were used to apply ethical standards and privacy. This also involves the building of AI systems that can be audited later—even when they reside outside an organization. If you don’t have these precautions in place, you will wind up debating who can and who can’t access data, and you will repeat this for every project. This task slows down an organization and increases the risk of breach or failure. Ultimately, it translates into lost value for the company.
Aiming for Good Values
In the end, a best practice for secure AI requires an organization to identify and define the end-to-end process for collecting data; build and deploy AI platforms that can use protected sensitivity of data; and develop an IT framework that ensures data in motion can remain protected and anonymized when necessary. This need extends to websites, apps, devices, and other systems. Likewise, it’s vital to keep any eye on various data sources to see which AI models change and how they affect one another.
Finally, there’s a need to know that specific tools protect data across an ecosystem. This includes multi-cloud and hybrid-cloud environments (including containers and migrations that occur within clouds); AI protection solutions that anonymize, de-identify or tokenize data and access; encryption methods, such as homomorphic encryption, that can hide the actual data even while it’s being analyzed; policy enforcement frameworks that support initiatives such as GDPR and the California Consumer Privacy Act; and robust privacy reporting and auditing tools to ensure that systems are performing as expected.
Many organizations are only beginning to grasp the possibilities of secure AI and are assembling teams to create a more secure framework. However, adding and adapting these tools, technologies, and processes to fit this rapidly evolving space is critical. Organizations that get things right construct a working framework that’s equipped for today’s business environment—but one that is flexible and agile enough to respond to rapidly changing requirements in the data-analytics space. They’re positioned to unleash innovation and, in a best case, achieve disruption.