Researchers apply privacy-preserving AI to large-scale genomic studies

Researchers have published a study on applying a privacy-preserving AI technique — homomorphic encryption (HE) — to large-scale, genome-wide association studies (GWAS) on genetic and phenotype data. They claim it’s 30 times faster than state-of-the-art approaches while protecting the privacy of subjects the data belongs to. The team is affiliated with startup Duality Technologies, Harvard Medical School, and the Dana-Farber Cancer Institute and published the findings this week in the Proceedings of the National Academy of Sciences (PNAS).

GWAS can yield insights into a range of diseases, including COVID-19, and the technique is currently being used to identify genetic variants associated with susceptibility or response to the novel coronavirus. But regulations like HIPAA require the anonymization of some medical data to protect participants’ identities, which is where techniques like HE come in.

HE isn’t new, but it has gained traction in recent years, coinciding with advances in compute power and efficiency. It’s basically a form of cryptography that enables computation on plaintext (file contents) encrypted using an algorithm (also known as ciphertexts) so that the generated encrypted result exactly matches the result of operations that would have been performed on unencrypted text. Using this technique, a “cryptonet” (e.g, any learned neural network that can be applied to encrypted data) can perform computation on data and return the encrypted result back to a client that can use the encryption key — which was never shared publicly — to decrypt the returned data and get the actual result.

In the PNAS-published study, the coauthors developed a framework incorporating HE and demonstrated it could perform a GWAS analysis on a data set of more than 25,000 people. It kept all individual data encrypted and required no user interactions, and the researchers’ extrapolations showed the framework could evaluate a GWAS of 100,000 people and 500,000 single-nucleotide polymorphisms in 5.6 hours on a single server. (Single-nucleotide polymorphisms are substitutions of single nucleotides that occur at a specific position in the genome and that correlate with disease, drug response, and other phenotypes.)

The team says the approach could be applied to other branches of medical research, such as clinical trials, drug repurposing, and rare disease studies. “[The results are in] contrast to the claim that HE is not viable for large-scale GWASes,” the coauthors wrote. “Our [work] is thus a significant advance both in methodology and application, empowering large-scale cross-institution collaboration, patient-driven research, and crowdsourced genomics.”

Duality was cofounded in 2016 by Alon Kaufman, chair Rina Shainski, Turing Award-winning professor Shafi Goldwasser, MIT professor Vinod Vaikuntanathan, and open source pioneer Dr. Kurt Rohloff. Vaikuntanathan is the co-inventor of the foundational BGV homomorphic encryption scheme, and Rohloff is the founder of the PALISADE HE open source library on which Duality’s platform is based.

The company keeps a low profile, but it primarily deals in regulated industries like banking. Its SecurePlus offering enables multiple parties to collaborate without exposing their data or analytics models. Data remains protected end-to-end even when analyzed in untrusted cloud environments, courtesy of “quantum-resistant” technologies that conform to the standards laid out by the homomorphic encryption industry consortium.