CSpace  > 应用数学研究所
Approximate distance correlation for selecting highly interrelated genes across datasets
Shen, Qunlun1,2; Zhang, Shihua1,2,3,4
2021-11-01
发表期刊PLOS COMPUTATIONAL BIOLOGY
ISSN1553-734X
卷号17期号:11页码:18
摘要With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies. Author summaryThe number and size of biological datasets (e.g., single-cell RNA-seq datasets) are booming recently. How to mine the relationships of genes across datasets is becoming an important issue. Computational tools of identifying differentially expressed genes have been comprehensively studied, but the interrelated genes across datasets are always neglected. Detecting of highly interrelated genes across datasets is hindered because the samples of them are always different and they could have different numbers of samples. To solve this problem, we present a new algorithm that can identify interrelated genes across datasets based on distance correlation. Our proposed algorithm is very efficient and works well in different technologies, i.e., RNA-seq, single-cell RNA-seq and single-cell ATAC-seq. Also, we found that the number of such highly interrelated genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.
DOI10.1371/journal.pcbi.1009548
收录类别SCI
语种英语
资助项目National Key Research and Development Program of China[2019YFA0709501] ; Strategic Priority Research Program of the Chinese Academy of Sciences (CAS)[XDPB17] ; Key-Area Research and Development of Guangdong Province[2020B1111190001] ; National Natural Science Foundation of China[61621003] ; National Ten Thousand Talent Program for Young Top-notch Talents ; CAS Frontier Science Research Key Project for Top Young Scientist[QYZDB-SSW-SYS008] ; Shanghai Municipal Science and Technology Major Project[2017SHZDZX01]
WOS研究方向Biochemistry & Molecular Biology ; Mathematical & Computational Biology
WOS类目Biochemical Research Methods ; Mathematical & Computational Biology
WOS记录号WOS:000721101000005
出版者PUBLIC LIBRARY SCIENCE
引用统计
文献类型期刊论文
条目标识符http://ir.amss.ac.cn/handle/2S8OKBNM/59635
专题应用数学研究所
通讯作者Zhang, Shihua
作者单位1.Chinese Acad Sci, Acad Math & Syst Sci, RCSDS, CEMS,NCMIS, Beijing, Peoples R China
2.Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
3.Chinese Acad Sci, Ctr Excellence Anim Evolut & Genet, Kunming, Yunnan, Peoples R China
4.Chinese Acad Sci, Univ Chinese Acad Sci, Hangzhou Inst Adv Study, Key Lab Syst Biol, Hangzhou, Peoples R China
推荐引用方式
GB/T 7714
Shen, Qunlun,Zhang, Shihua. Approximate distance correlation for selecting highly interrelated genes across datasets[J]. PLOS COMPUTATIONAL BIOLOGY,2021,17(11):18.
APA Shen, Qunlun,&Zhang, Shihua.(2021).Approximate distance correlation for selecting highly interrelated genes across datasets.PLOS COMPUTATIONAL BIOLOGY,17(11),18.
MLA Shen, Qunlun,et al."Approximate distance correlation for selecting highly interrelated genes across datasets".PLOS COMPUTATIONAL BIOLOGY 17.11(2021):18.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Shen, Qunlun]的文章
[Zhang, Shihua]的文章
百度学术
百度学术中相似的文章
[Shen, Qunlun]的文章
[Zhang, Shihua]的文章
必应学术
必应学术中相似的文章
[Shen, Qunlun]的文章
[Zhang, Shihua]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。