Clinical data sharing using Generative Adversarial Networks

Seyed Mohammad Ayyoubzadeh; Seyed Mehdi Ayyoubzadeh; Marzieh Esmaeili

doi:10.20517/ch.2022.15

Download PDF

Editorial | Open Access | 28 Aug 2022

Clinical data sharing using Generative Adversarial Networks

Views: 705 | Downloads: 1004 | Cited:

3

Seyed Mohammad Ayyoubzadeh¹

,

Seyed Mehdi Ayyoubzadeh^1,2

,

Marzieh Esmaeili¹

Conn Health Telemed 2022;1:98-100.

10.20517/ch.2022.15 | © The Author(s) 2022.

Author Information

Article Notes

Cite This Article

Abstract

Obtaining data is challenging for researchers, especially when it comes to medical data. Moreover, using medical data as there are concerns about privacy and confidentiality issues requires specific considerations. Generative models aim to learn data distribution via various statistical learning approaches. Among generative models, a machine learning-based approach named Generative Adversarial Networks (GANs) has proved their potential in the implicit density estimation of high dimensional data. Therefore, we suggest an approach that each healthcare organization, especially hospitals, could create and share their own GAN model, entitled Hospital-Based GANs (H-GANs), instead of sharing raw data of patients.

Keywords

Machine learning, Generative Adversarial Networks, data sharing, anonymous

Download PDF 0 17

MEDICAL DATA SHARING PROBLEM

Obtaining data is challenging for researchers, especially when it comes to medical data. Using medical data as there are concerns about privacy and confidentiality issues requires specific considerations. Also, Sharing this data is necessary to verify the experiments and extract more knowledge from the data^[1]. One of the potential solutions for data sharing while preserving privacy is the de-identification of data. The main concern in this approach is that the process could be reversed, and the real patients’ identities would be unveiled. Another solution for sharing data is to encourage the patient populations to share data by giving rewards to them or benefiting their communities^[2]. While it can be a feasible solution for small health ecosystems, the scalability of this approach is questionable. Many stakeholders, including each one of the patients, could have a different viewpoint. Thus, reaching a consensus might be challenging. In this paper, we have proposed a new solution to overcome the medical sharing problem. The main idea behind our solution can be demonstrated by a simple example: assume that in a scenario, we want to share the heights of individuals without disclosure of their identities. In this case, we could share the distribution of the heights (in the case of normal distribution, sharing the mean and standard deviation). Having the parameters of this distribution enables others to reuse the data and create samples of the heights. The cornerstone of this approach is to identify the distribution of the data. It is worth mentioning that the estimation of the data distribution would be a very complicated task when it comes to high-dimensional data such as medical images. A well-studied branch of machine learning called generative models has emerged to address such a problem.

GENERATIVE MODELS AS A SAFE WAY TO SHARE PRIVATE DATA

The underlying assumption in most machine learning tasks is that data samples are drawn from a unique data-generating distribution^[3]. Generative models aim to learn this distribution via various statistical learning approaches. Once we have the data generating distribution, we can generate new samples of data that are not necessarily the same as input data. Hence, the generative models can be viewed as a secure tool for sharing new data while preserving the patients’ privacy. Generative models fall into two categories: implicit density estimation and explicit density estimation^[4]. Here, what we are interested in is generating new samples from the data distribution and not the parametric distribution. Among generative models, Generative Adversarial Networks (GANs) have proved their potential in the implicit density estimation of high dimensional data.

STATE OF THE ARTS OF THE GENERATIVE MODELS: GAN NETWORKS

Recently, Deep Learning has outperformed traditional methods in different areas, including computer vision, natural language processing, and image processing. Deep learning models are powerful in learning highly nonlinear mappings. GANs can be viewed as the marriage of deep learning and generative models. GANs are composed of two neural networks: a generator and a discriminator network^[5]. The generator tries to fool the discriminator by generating realistic data that are close to the distribution of the data, and the discriminator tries to discriminate between these so-called fake data and the real data. In other words, the training process is a minimax game. Note that, after training the GAN to generate new samples, we only require the generator network, and the discriminator can be discarded. As a result, the generator creates samples that are from the same distribution of the data. They successfully have been implemented for generating samples by learning the data generating distribution from a limited amount of data^[6]. Currently, GANs are widely used to generate new texts and images for different purposes. One important application of GANs is to enhance the performance of the classifiers that are trained by imbalanced datasets. An imbalanced dataset can severely affect the performance of the classifier, and these types of datasets are prevalent in medical applications. For example, in breast cancer datasets, the number of mammography images with malignancy is much less than benign ones. This makes the classifier biased towards the benign class^[4]. To solve this problem, GANs can be used to make such datasets balanced. We can train a GAN to generate malignant images, then make new samples of the malignant cases.

INTRODUCING HOSPITAL-BASED GANS

We suggest an approach that each healthcare organization, especially hospitals, could create and share their own GAN - Hospital-Based GANs (H-GANs) instead of sharing raw data of patients. This solution provides a framework for sharing the hospital data without violating patients’ privacy by providing a generator of data instead of the patients’ data records. In summary, this solution provides three major advantages: first and foremost is preserving patients’ privacy. Second, it enables the researchers to create an unlimited amount of data to train complex models that require huge amounts of data, such as deep learning classifiers. Also, it mitigates the imbalanced dataset issue. Besides, it reduces the required storage and bandwidth for storing and transferring the data by sharing the models instead of the whole images. For example, a dataset consisting of 5000 mammography images requires around 100GB, while the GAN model created from this dataset is around 100MB. That means a 1:1000 compression ratio. At the next level, The H-GANs could theoretically be combined to create multi-hospital, national, regional, and even global GANs, and these models could include a comprehensive range of samples.

DECLARATIONS

Authors’ contributions

Made substantial contributions to the conception and design of the study and performed data analysis, interpretation and data acquisition, as well as providing administrative, technical, and material support: Ayyoubzadeh SM (Seyed Mohammad Ayyoubzadeh), Ayyoubzadeh SM (Seyed Mehdi Ayyoubzadeh), Marzieh Esmaeili

Availability of data and materials

Not applicable.

Financial support and sponsorship

None.

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

REFERENCES

1. Bauchner H, Golub RM, Fontanarosa PB. Data sharing: an ethical and scientific imperative. JAMA 2016;315:1237-9.

2. McCoy MS, Joffe S, Emanuel EJ. Sharing patient data without exploiting patients. JAMA 2020;323:505-6.

3. Goodfellow I, Bengio Y, Courville A. Deep learning. Cambridge, MA: MIT Press; 2016. Available from: https://books.google.com.hk/books?hl=zh-CN&lr=&id=omivDQAAQBAJ&oi=fnd&pg=PR5&dq=Goodfellow,+I.,+Y.+Bengio,+and+A.+Courville,+Deep+learning.+2016:+MIT+press.&ots=MNS-dvnBPZ&sig=NJdjTCQPqdh_9MNYzT7igJdFhfE&redir_esc=y#v=onepage&q=Goodfellow%2C%20I.%2C%20Y.%20Bengio%2C%20and%20A.%20Courville%2C%20Deep%20learning.%202016%3A%20MIT%20press.&f=false [Last accessed on 25 Aug 2022].

4. Goodfellow I. NIPS 2016 tutorial: Generative Adversarial Networks. arXiv 2017; doi: 10.48550/arXiv.1701.00160.

5. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Adv Neural Inf Process Syst 2014;27:2672-80. Available from: https://proceedings.neurips.cc/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3 [Last accessed on 25 Aug 2022]

6. Iqbal T, Ali H. Generative Adversarial Network for Medical Images (MI-GAN). J Med Syst 2018;42:231.

Cite This Article

Editorial

Open Access

Clinical data sharing using Generative Adversarial Networks

Seyed Mohammad Ayyoubzadeh, ... Marzieh Esmaeili

How to Cite

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

RIS BibTeX EndNote

Type of Import

Direct Import Indirect Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Copyright

© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views

705

Downloads

1004

Citations

3

Comments

0

17

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].