研究成果

当前位置：中文主页 - 研究成果

代表性成果

	Multiscale evolution of the 3D genome Yizhuo Che , Stephen J. Bush , Kai Ye *Trends in Genetics* . 345–356 (2026) . [PDF] The spatial organization of the genome underpins how and when its encoded information is utilized. Nevertheless, a unified understanding of how genome architecture evolves across different life scales is still emerging. This review synthesizes current research on 3D genome evolution from both micro- and macro-perspectives. At cellular-level timescales exemplified by cancer, architectures can evolve rapidly due to frequent genomic mutations and plastic epigenetic marks. Between closely related species, architectural divergence is driven primarily by local genomic alterations, mainly cis-regulatory elements. Across larger phylogenetic distances, genome architectures display striking diversity, yet recurrent higher-order features have nevertheless emerged through convergent evolution, reflecting common functional requirements. The micro-to-macro comparative framework proposed here delineates how the diversity of genome architectures relates to the evolution of form and function.
	The evolution of high-order genome architecture revealed from 1,000 species Yizhuo Che, Stephen J. Bush, Hui Lin, Mingxuan Li, Xiaofei Yang, Qi Xie, Yuchun Liu, Deyu Meng, Kai Ye Cell . (2026) . [PDF] Spatial genome organization plays a crucial regulatory role, but its evolutionary development remains unclear. Leveraging Hi-C data from 1,025 species, we trace the evolutionary trajectories of genome organization through 2 higher-order architectures, “global folding” (spatial organization of the karyotype) and “checkerboard” (spatial organization of chromatin compartments). Earlier unicellular life forms mostly displayed random genome configurations. Throughout the evolution of plants, global folding became and remained the prominent architecture. However, animals progressively developed more pronounced checkerboard architectures; these are also apparent during early embryogenesis, which suggests that they act as a conserved mechanism of gene regulation. In contrast, plants exhibit comparatively weaker checkerboard patterns and instead preferentially organize co-regulated genes into linear genomic clusters. Both strategies of gene arrangement reinforce the biological principle that “structure determines function”: divergent evolutionary paths converge on ......
	SpatialCOC: an integrative framework for spatial continuous mapping and cross-omics correction in spatial multi-omics data Mingxuan Li, Peisen Sun, Yisi Luo, Guancheng Zhou, Xiaofei Yang, Deyu Meng, Kai Ye *Nature Communications* . (2026) . [PDF] Integrating spatial multi-omics data presents significant challenges, particularly in uncovering the spatial patterns of cells and deciphering the real regulatory mechanisms among various omics. These insights are critical for harnessing the full potential of each modality while minimizing the impact of biotechnological biases that will lead to unstable results. Here, we introduce SpatialCOC, a framework that treats spatial information as prior knowledge to learn omics-specific spatial distributions, then discovering nonlinear correlations among modalities. The effectiveness and robustness of SpatialCOC are validated using real-world datasets, encompassing diverse tissue sections analyzed with multiple experimental techniques. Compared to existing methods, SpatialCOC excels in identifying region-specific continuous spatial domains and maintains batch-consistency across trajectory inferences. By providing a novel perspective on the interplay between spatial information and multi-omics modalities, SpatialCOC offers a flexible approach that can accommodate modality data of arbitrary dimensions.
	Highly accurate ab initio gene annotation with ANNEVO Pengyu Zhang, Tun Xu, Songbo Wang, Xiaofei Yang, Peisen Sun, Peng Jia, Jiadong Lin, Bo Wang, Yizhe Zhang, Deyu Meng, Stephen J. Bush, Zemin Ning, Kai Ye *Nature Methods*. 23, 740–748 (2026). [PDF] Accurate gene annotation is essential for deciphering the mapping from genomic sequences to their functional roles. However, current methods struggle to model complex gene transmission patterns, such as vertical inheritance and horizontal gene transfer. Here we introduce ANNEVO, a mixture of experts-based genomic language model that directly models distal sequence dependencies and joint evolutionary relationships from diverse genomes, enabling precise ab initio gene annotation. Through extensive benchmarking on 566 phylogenetically diverse species, we demonstrate that ANNEVO substantially outperforms existing ab initio methods and achieves performance comparable to state-of-the-art annotation pipelines. Furthermore, ANNEVO’s independence from external evidence allows it to deliver more complete annotations than reference annotations for a broad range of ......
	Population-level structural variant characterization using pangenome graphs Songbo Wang, Tun Xu, Pengyu Zhang, Kai Ye *Nature Genetics*. 58, 664–672 (2026). [PDF] Population-level structural variant (SV) profiling is crucial in the era of pangenomes. However, identifying SVs from genome assemblies and pangenome graphs remains a substantial challenge. Here we present Swave, a sequence-to-image, deep learning-based method that accurately resolves both simple and complex SVs, along with their population characteristics, from assembly-derived pangenome graphs. Swave introduces ‘projection waves’ to summarize the dotplot images that capture mapping patterns between reference and SV-indicating alleles in the pangenome. Then, a recurrent neural network distinguishes true SV signals from background noise introduced by genomic repeats. Swave demonstrates superior performance in both SV-type classification and genotyping compared with existing methods. When applied to healthy cohorts and rare-disease cohorts, Swave reveals ......
	Spatially resolved single-cell transcriptome analysis of murine Salmonella infection reveals the role of distal colonocytes in the inflammatory response Dan Xu ,Ruifen Zhang ,Shanshan Li ,Can Guo ,Chenglin Guan ,Xiang Li ,Mengyao Guo ,Xin Xu ,Yaxin Liu ,Chenyi Mao ,Peisen Sun ,Xiaomin Dang ,Diya Sun ,Chengyao Wang ,Stephen J. Bush ,Kai Ye *Gut Microbes*. Dec 31;17(1):2579909 (2025) . [PDF] The intestine is a highly compartmentalized organ, with distinct segments exhibiting both varying susceptibilities and responses to enteric pathogens, although the cellular and molecular bases of these responses remain elusive. Here, we used Salmonella Typhimurium (S. Tm), a prominent enteric pathogen that causes human colitis, to establish a murine model of Salmonella enterocolitis. By integrating bulk RNA-seq, single-cell RNA-seq, and spatial RNA-seq data, we present a comprehensive spatiotemporal single-cell transcriptomic landscape of the colon over a week-long time course of infection. We identified the distal colon as the intestinal segment where most of the host responses were initiated, with distal colonocytes (DCCs) being the most responsive epithelial cells upon the onset of infection. Furthermore, by correlating our findings with human intestinal single-cell transcriptome data, we identified a human colonocyte population that shares many characteristics with murine DCCs. Our study advances the understanding of the cellular and molecular basis of compartmentalized intestinal responses to pathogenic insults and may pave the way for novel preventive and therapeutic strategies to mitigate intestinal damage and combat intestinal infections.
	Deciphering Complex Interactions Between LTR Retrotransposons and Three Papaver Species Using LTR_Stream Tun Xu, Stephen J Bush, Yizhuo Che, Huanhuan Zhao, Tingjie Wang, Peng Jia, Songbo Wang, Peisen Sun, Pengyu Zhang, Shenghan Gao, Yu Xu, Chengyao Wang, Ningxin Dang, Yong E Zhang, Xiaofei Yang, Kai Ye *Genomics Proteomics Bioinformatics*. Sep 22;23(4):qzaf061 (2025) . [PDF] Long terminal repeat retrotransposons (LTR-RTs), a major type of class I transposable elements, are the most abundant repeat element in plants. The study of the interactions between LTR-RTs and the host genome relies on high-resolution characterization of LTR-RTs. However, for non-model species, this remains a challenge. To address this, we developed LTR_Stream for sublineage clustering of LTR-RTs in specific or closely related species, providing higher precision than current database-based lineage-level clustering. Using LTR_Stream, we analyzed Retand LTR-RTs in three Papaver species. Our findings show that high-resolution clustering reveals complex interactions between LTR-RTs and the host genome. For instance, we found that autonomous Retand elements could spread among the ancestors of different subgenomes, like retroviral pandemics, enriching genetic diversity.
	STMiner: gene-centric spatial transcriptomics for deciphering tumor tissues Peisen Sun, Stephen J Bush, Songbo Wang, Peng Jia, Mingxuan Li, Tun Xu, Pengyu Zhang, Xiaofei Yang, Chengyao Wang, Linfeng Xu, Tingjie Wang, Kai Ye *Cell Genomics*. Feb 12;5(2):100771 (2025) . [PDF] Analyzing spatial transcriptomics data from tumor tissues poses several challenges beyond those of healthy samples, including unclear boundaries between different regions, uneven cell densities, and relatively higher cellular heterogeneity. Collectively, these bias the background against which spatially variable genes are identified, which can result in misidentification of spatial structures and hinder potential insight into complex pathologies. To overcome this problem, STMiner leverages 2D Gaussian mixture models and optimal transport theory to directly characterize the spatial distribution of genes rather than the capture locations of the cells expressing them (spots). By effectively mitigating the impacts of both background bias and data sparsity, STMiner reveals key gene sets and spatial structures overlooked by spot-based analytic tools, facilitating novel biological discoveries. The core concept of directly analyzing overall gene expression patterns also allows for a broader application beyond spatial transcriptomics, positioning STMiner for continuous expansion as spatial omics technologies evolve.
	Shigella infection is facilitated by interaction of human enteric α-defensin 5 with colonic epithelial receptor P2Y11 Dan Xu, Mengyao Guo, Xin Xu, Gan Luo, Yaxin Liu, Stephen J. Bush, Chengyao Wang, Tun Xu,..., Yaming Jiu, Nathalie Sauvonnet, Wuyuan Lu, Philippe J. Sansonetti, Kai Ye *Nature Microbiology*. 10, 509–526 (2025) . [PDF] Human enteric α-defensin 5 (HD5) is an immune system peptide that acts as an important antimicrobial factor but is also known to promote pathogen infections by enhancing adhesion of the pathogens. The mechanistic basis of these conflicting functions is unknown. Here we show that HD5 induces abundant filopodial extensions in epithelial cells that capture Shigella, a major human enteroinvasive pathogen that is able to exploit these filopodia for invasion, revealing a mechanism for HD5-augmented bacterial invasion. Using multi-omics screening and in vitro, organoid, dynamic gut-on-chip and in vivo models, we identify the HD5 receptor as P2Y11, a purinergic receptor distributed apically on the luminal surface of the human colonic epithelium. Inhibitor screening identified cAMP-PKA signalling as the main pathway mediating the cytoskeleton-regulating activity of HD5. In illuminating this mechanism of Shigella invasion, our findings raise the possibility of alternative intervention strategies against HD5-augmented infections.
	The centromere landscapes of four karyotypically diverse Papaver species provide insights into chromosome evolution and speciation Shenghan Gao , Yanyan Jia , Hongtao Guo , Tun Xu , Bo Wang , Stephen J. Bush , Shijie Wan , Yimeng Zhang , Xiaofei Yang , Kai Ye *National Science Review*. 4(8):100626 (2024) . [PDF] Understanding the roles played by centromeres in chromosome evolution and speciation is complicated by the fact that centromeres comprise large arrays of tandemly repeated satellite DNA, which hinders high-quality assembly. Here, we used long-read sequencing to generate nearly complete genome assemblies for four karyotypically diverse Papaver species, P. setigerum (2n = 44), P. somniferum (2n = 22), P. rhoeas (2n = 14), and P. bracteatum (2n = 14), collectively representing 45 gapless centromeres. We identified four centromere satellite (cenSat) families and experimentally validated two representatives. For the two allopolyploid genomes (P. somniferum and P. setigerum), we characterized the subgenomic distribution of each satellite and identified a “homogenizing” phase of centromere evolution in the aftermath of hybridization. An interspecies comparison of the peri-centromeric regions further revealed extensive centromere-mediated chromosome rearrangements......
	Near telomere-to-telomere genome assemblies of two Chlorella species unveil the composition and evolution of centromeres in green algae Bo Wang, Yanyan Jia, Ningxin Dang, Jie Yu, Stephen J. Bush, Shenghan Gao, Wenxi He, Sirui Wang, Hongtao Guo, Xiaofei Yang, Weimin Ma, Kai Ye *BMC Genomics* . 25, 356 (2024). [PDF] We constructed near telomere-to-telomere (T2T) assemblies for two Trebouxiophyceae species, Chlorella sorokiniana NS4-2 and Chlorella pyrenoidosa DBH, with chromosome numbers of 12 and 13, and genome sizes of 58.11 Mb and 53.41 Mb, respectively. We identified and validated their centromere sequences using CENH3 ChIP-seq and found that, similar to humans and higher plants, the centromeric CENH3 signals of green algae display a pattern of hypomethylation. Interestingly, the centromeres of both species largely comprised transposable elements, although they differed significantly in their composition. Species within the Chlorella genus display a more diverse centromere composition, with major constituents including members of the LTR/Copia, LINE/L1, and LINE/RTEX families. This is in contrast to green algae including Chlamydomonas reinhardtii, Coccomyxa subellipsoidea, and Chromochloris zofingiensis, in which centromere composition instead has a pronounced single-element composition. Moreover, we observed significant differences in ......
	De novo and somatic structural variant discovery with SVision-pro Songbo Wang, Jiadong Lin, Peng Jia, Tun Xu, Xiujuan Li, Yuezhuangnan Liu, Dan Xu, Stephen J. Bush, Deyu Meng, Kai Ye *Nature Biotechnology* . 43, 181–185 (2025). [PDF] Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.Here we propose SVision-pro, comprising two key modules: a sequence-to-image representation module encoding genomic features from two samples in a single image, from which a neural-network recognition module comparatively recognizes SVs as well as their intergenome differences. SVision-pro integrates SV detection and genotyping between ......
	Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet Peng Jia, Lianhua Dong, Xiaofei Yang, Bo Wang, Stephen J. Bush, Tingjie Wang, Jiadong Lin, Songbo Wang, ..., Han Xia, Yuanting Zheng, Leming Shi, Yi Lv, Jing Wang, Kai Ye *Genome Biol* . 24, 277 (2023). [PDF] The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent–child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity—including those located at long repeat regions, complex structural variants, and de novo mutations—are systematically examined in this study. Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet ......
	A pangenome reference of 36 Chinese populations Yang Gao, Xiaofei Yang, Hao Chen, Xinjiang Tan, Zhaoqing Yang, Lian Deng, Baonan Wang, Shuang Kong, Songyang Li, Yuhang Cui,..., Chinese Pangenome Consortium (CPC), Yan Lu, Jiayou Chu, Kai Ye, Shuhua Xu *Nature* . 619, 112–121 (2023). [PDF] Human genomics is witnessing an ongoing paradigm shift from a single reference sequence to a pangenome form, but populations of Asian ancestry are underrepresented. Here we present data from the first phase of the Chinese Pangenome Consortium, including a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 minority Chinese ethnic groups. With an average 30.65× high-fidelity long-read sequence coverage, an average contiguity N50 of more than 35.63 megabases and an average total size of 3.01 gigabases, the CPC core assemblies add 189 million base pairs of euchromatic polymorphic sequences and 1,367 protein-coding gene duplications to GRCh38. We identified 15.9 million small variants and 78,072 structural variants, of which 5.9 million small variants and 34,223 structural variants were not reported in a recently released pangenome reference1. The Chinese Pangenome Consortium data demonstrate a remarkable increase in the discovery of novel and missing sequences when individuals are included from underrepresented minority ethnic groups......
	HiCAT: a tool for automatic annotation of centromere structure Shenghan Gao, Xiaofei Yang, Hongtao Guo, Xixi Zhao, Bo Wang, Kai Ye *Genome Biology* . 24, 58 (2023) . [PDF] Significant improvements in long-read sequencing technologies have unlocked complex genomic areas, such as centromeres, in the genome and introduced the centromere annotation problem. Currently, centromeres are annotated in a semi-manual way. Here, we propose HiCAT, a generalizable automatic centromere annotation tool, based on hierarchical tandem repeat mining to facilitate decoding of centromere architecture. We apply HiCAT to simulated datasets, human CHM13-T2T and gapless Arabidopsis thaliana genomes. Our results are generally consistent with previous inferences but also greatly improve annotation continuity and reveal additional fine structures, demonstrating HiCAT’s performance and general applicability. HiCAT takes a monomer template and a centromere DNA sequence as inputs. There are two steps in HiCAT: generation of a block list and similarity matrix and mining of HORs.
	Cellular heterogeneity and transcriptomic profiles during intrahepatic cholangiocarcinoma initiation and progression Tingjie Wang, Chuanrui Xu, Zhijing Zhang, Hua Wu, Xiujuan Li, Yu Zhang, Nan Deng, Ningxin Dang, Guangbo Tang, Xiaofei Yang, Bingyin Shi, Zihang Li, Lei Li, Kai Ye *Hepatology* . 76:1302–1317 (2022) . [PDF] We performed single-cell RNA sequencing (scRNA-seq) using AKT/Notch intracellular domain–induced mouse ICC tissues at early, middle, and late stages. We analyzed the transcriptomic landscape, cellular classification and evolution, and intercellular communication during ICC initiation/progression. We confirmed the findings using quantitative real-time PCR, western blotting, immunohistochemistry or immunofluorescence, and gene knockout/knockdown analysis. We identified stress-responding and proliferating subpopulations in late-stage mouse ICC tissues and validated them using human scRNA-seq data sets. By integrating weighted correlation network analysis and protein–protein interaction through least absolute shrinkage and selection operator regression, we identified zinc finger, MIZ-type containing 1 (Zmiz1) and Y box protein 1 (Ybx1) as core transcription factors required by stress-responding and proliferating ICC cells, respectively. Knockout of either one led to the blockade of ICC initiation/progression. Using two other ICC mouse models (YAP/AKT, KRAS/p19) and human ICC scRNA-seq data sets, we confirmed the orchestrating roles of Zmiz1 and Ybx1 in ICC occurrence and development. In addition, hes family bHLH transcription factor 1, cofilin 1, and ......
	SVision: a deep learning approach to resolve complex structural variants Jiadong Lin, Songbo Wang, Peter A. Audano, Deyu Meng, Jacob I. Flores, Walter Kosters, Xiaofei Yang, Peng Jia, Tobias Marschall, Christine R. Beck, Kai Ye *Nature Methods* . 19, 1230–1233 (2022). [PDF] Complex structural variants (CSVs) encompass multiple breakpoints and are often missed or misinterpreted. We developed SVision, a deep-learning-based multi-object-recognition framework, to automatically detect and characterize CSVs from long-read sequencing data. SVision outperforms current callers at identifying the internal structure of complex events and has revealed 80 high-quality CSVs with 25 distinct structures from an individual genome. SVision directly detects CSVs without matching known structures, allowing sensitive detection of both common and previously uncharacterized complex rearrangements. We developed an automated CSV detection and characterization method: SVision. It introduces a sequence-to-image coding schema, adapting variant detection to a problem that is amenable to deep-learning frameworks. SVision contains three core components: an encoder that represents the differences and similarities between a variant-supporting read and its corresponding segment on the reference genome as a denoised image, a targeted multi-object recognition (tMOR) framework......
	Three chromosome-scale Papaver genomes reveal punctuated patchwork evolution of the morphinan and noscapine biosynthesis pathway Xiaofei Yang, Shenghan Gao, Li Guo, Bo Wang, Yanyan Jia, Jian Zhou, Yizhuo Che, Peng Jia, Jiadong Lin, Tun Xu, Jianyong Sun, Kai Ye *Nature Communications* . 12, 6030 (2021). [PDF] For millions of years, plants evolve plenty of structurally diverse secondary metabolites (SM) to support their sessile lifestyles through continuous biochemical pathway innovation. While new genes commonly drive the evolution of plant SM pathway, how a full biosynthetic pathway evolves remains poorly understood. The evolution of pathway involves recruiting new genes along the reaction cascade forwardly, backwardly, or in a patchwork manner. With three chromosome-scale Papaver genome assemblies, we here reveal whole-genome duplications (WGDs) apparently accelerate chromosomal rearrangements with a nonrandom distribution towards SM optimization. A burst of structural variants involving fusions, translocations and duplications within 7.7 million years have assembled nine genes into the benzylisoquinoline alkaloids gene cluster, following a punctuated patchwork model. Biosynthetic gene copies and their total expression matter to morphinan production. Our results demonstrate how new genes have been recruited from ......
	The opium poppy genome and morphinan production Li Guo, Thilo Winzer, Xiaofei Yang, Yi Li, Zemin Ning, Zhesi He, Roxana Teodor, Ying Lu, Tim A Bowser, Ian A Graham, Kai Ye *Science* . 362,343-347(2018) . [PDF] Morphinan-based painkillers are derived from opium poppy (Papaver somniferum L.). We report a draft of the opium poppy genome, with 2.72 gigabases assembled into 11 chromosomes with contig N50 and scaffold N50 of 1.77 and 204 megabases, respectively. Synteny analysis suggests a whole-genome duplication at ~7.8 million years ago and ancient segmental or whole-genome duplication(s) that occurred before the Papaveraceae-Ranunculaceae divergence 110 million years ago. Syntenic blocks representative of phthalideisoquinoline and morphinan components of a benzylisoquinoline alkaloid cluster of 15 genes provide insight into how this cluster evolved. Paralog analysis identified P450 and oxidoreductase genes that combined to form the STORR gene fusion essential for morphinan biosynthesis in opium poppy. Thus, gene duplication, rearrangement, and fusion events have led to evolution of specialized metabolic products in opium poppy.

科研项目

在研项目：

2025-2029，基于序列到图像转换策略的基因组序列计算框架构建，国家自然科学基金委重点项目，负责人
2022-2027，关键模式生物基因组精细研究及其示范应用，科技部重点研发专项，负责人
2022-2026，国家杰出青年科学基金（生物信息学），国家自然科学基金委，负责人

结题项目：

2021-2024，肿瘤微卫星不稳定计算方法及其应用，国家自然科学基金委面上项目，负责人
2018-2020，精准医学大数据的有效挖掘与关键信息技术研发，国家重点研发计划，子课题负责人
2017-2020，中国人群多组学参比数据库与分析系统建设，国家重点研发计划，项目骨干
2017-2020，基因组复杂变异的检测算法和应用，国家自然科学基金委面上项目，负责人

成果列表

科研论文：

Yizhuo Che, Stephen J. Bush , Kai Ye. (2026). Multiscale evolution of the 3D genome. Trends in Genetics, 42(4), 345–356.
Yizhuo Che, Stephen J. Bush, Hui Lin, Mingxuan Li, Xiaofei Yang, Qi Xie, Yuchun Liu, Deyu Meng, Kai Ye. (2026). The evolution of high-order genome architecture revealed from 1,000 species. Cell.
Mingxuan Li, Peisen Sun, Yisi Luo, Guancheng Zhou, Xiaofei Yang, Deyu Meng, Kai Ye. (2026). SpatialCOC: an integrative framework for spatial continuous mapping and cross-omics correction in spatial multi-omics data. Nature Communications, April 16.
Pengyu Zhang, Tun Xu, Songbo Wang, Xiaofei Yang, Peisen Sun, Peng Jia, Jiadong Lin, Bo Wang, Yizhe Zhang, Deyu Meng, Stephen J. Bush, Zemin Ning, Kai Ye. (2026). Highly accurate ab initio gene annotation with ANNEVO. Nature Methods, 23(4), 740–748.
Songbo Wang, Tun Xu, Pengyu Zhang, Kai Ye. (2026). Population-level structural variant characterization using pangenome graphs. Nature Genetics, 58(3), 664–672.
Dan Xu, Ruifen Zhang, Shanshan Li, Can Guo, Chenglin Guan, Xiang Li, Mengyao Guo, Xin Xu, Yaxin Liu, Chenyi Mao, Peisen Sun, Xiaomin Dang, Diya Sun, Chengyao Wang, Stephen J Bush, Kai Ye. (2025). Spatially resolved single-cell transcriptome analysis of murine Salmonella infection reveals the role of distal colonocytes in the inflammatory response. Gut Microbes, 17(1), 2579909.
Bo Wang, Peng Jia, Stephen J Bush, Xia Wang, Yi Yang, Yu Zhang, Shijie Wan, Xiaofei Yang, Pengyu Zhang, Yuanting Zheng, Leming Shi, Lianhua Dong, Kai Ye. (2025). A Telomere-to-Telomere Diploid Reference Genome and Centromere Structure of the Chinese Quartet. Genomics Proteomics Bioinformatics. Nov 26:qzaf118.
Tun Xu, Stephen J Bush, Yizhuo Che, Huanhuan Zhao, Tingjie Wang, Peng Jia, Songbo Wang, Peisen Sun, Pengyu Zhang, Shenghan Gao, Yu Xu, Chengyao Wang, Ningxin Dang, Yong E Zhang, Xiaofei Yang, Kai Ye. (2025). Deciphering Complex Interactions Between LTR Retrotransposons and Three Papaver Species Using LTR_Stream. Genomics Proteomics Bioinformatics, Sep 22;23(4):qzaf061.
Peisen Sun, Mingxuan Li, Kai Ye. (2025). Protocol to decipher complex spatial transcriptomics data using STMiner. STAR protocols. Jun 20;6(2):103838.
Peisen Sun, Stephen J Bush, Songbo Wang, Peng Jia, Mingxuan Li, Tun Xu, Pengyu Zhang, Xiaofei Yang, Chengyao Wang, Linfeng Xu, Tingjie Wang, Kai Ye. (2025). STMiner: gene-centric spatial transcriptomics for deciphering tumor tissues. Cell Genomics, Feb 12;5(2):100771.
Dan Xu, Mengyao Guo, Xin Xu, Gan Luo, Yaxin Liu, Stephen J Bush, Chengyao Wang, Tun Xu, Wenxin Zeng, Chongbing Liao, Qingxia Wang, Wei Zhao, Wenying Zhao, Yuezhuangnan Liu, Shanshan Li, Shuangshuang Zhao, Yaming Jiu, Nathalie Sauvonnet, Wuyuan Lu, Philippe J Sansonetti, Kai Ye. (2025). Shigella infection is facilitated by interaction of human enteric α-defensin 5 with colonic epithelial receptor P2Y11. Nature Microbiology, 10(2):509-526.
Bo Wang, Peng Jia, Shenghan Gao, Huanhuan Zhao, Gaoyang Zheng, Linfeng Xu, Kai Ye. (2025). Long and Accurate: How HiFi Sequencing is Transforming Genomics. Genomics, Proteomics & Bioinformatics, qzaf003.
Shenghan Gao, Yimeng Zhang, Stephen J Bush, Bo Wang, Xiaofei Yang, Kai Ye. (2024). Centromere Landscapes Resolved from Hundreds of Human Genomes. Genomics Proteomics Bioinformatics, Oct 18:qzae071.
Wang, S., Ye, K. (2024). Deep-learning based representation and recognition for genome variants – From SNVs to structural variants. National Science Review, nwae335.
Gao, S., Jia, Y., Guo, H., Xu, T., Wang, B., Bush, S.J., Wan, S., Zhang, Y., Yang, X., Ye, K. (2024). The centromere landscapes of four karyotypically diverse Papaver species provide insights into chromosome evolution and speciation. Cell Genomics, 100626.
Wang, B., Jia, Y., Dang, N., Yu J., …, Ma, W., Ye, K. (2024). Near telomere-to-telomere genome assemblies of two Chlorella species unveil the composition and evolution of centromeres in green algae. BMC Genomics, 25, 356.
Wang, S., Lin, J., Jia, P., Xu, T., Li, X., Liu, Y., Xu, D., Bush, S.J., Meng, D., Ye, K. (2024). De novo and somatic structural variant discovery with SVision-pro. Nature Biotechnology, Mar 22.
Jia P., Yang X., Yang X., Wang T., Xu Y., Ye K., (2024). MSIsensor-RNA: Microsatellite Instability Detection for Bulk and Single-cell Gene Expression Data, Genomics, Proteomics & Bioinformatics, qzae004.
Jia, P., Dong, L., Yang, X., Wang, B., Bush, S.J., Wang, T., …, Lv, Y., Wang, J., Ye, K. (2023). Haplotype-resolved assemblies and variant benchmark of a Chinese Quartet. Genome Biology, 24(1):277.
Xu, Y., Bush, S.J., Yang, X., Xu, L., Wang, B., Ye, K. (2023). Evolutionary analysis of conserved non-coding elements subsequent to whole-genome duplication in opium poppy. The Plant Journal, 116(6):1804-1824.
Wang, B., Dang, N., Yang, X., Xu, S., & Ye, K. (2023). The human pangenome reference: the beginning of a new era for genomics. Science Bulletin, 68(14), 1484-1487.
Lin, J., Jia, P., Wang, S., Kosters, W., & Ye, K. (2023). Comparison and benchmark of structural variants detected from long read and long-read assembly. Briefings in Bioinformatics, 24(4), bbad188.
Jia, Y., Xu, Y., Wang, B., Guo, L., Guo, M., Che, X., & Ye, K. (2023). The tissue-specific chromatin accessibility landscape of Papaver somniferum. Frontiers in Genetics, 14.
How-Kit, A., & Ye, K. (2023). Editorial: Microsatellite and microsatellite instability. Frontiers in Genetics, 14.
Gao, Y., Yang, X., Chen, H., Tan, X., Yang, Z., Deng, L., . . . Ye, K., Xu, S.(2023). A pangenome reference of 36 Chinese populations. Nature, 619(7968), 112-121.
Gao, S., Yang, X., Guo, H., Zhao, X., Wang, B., & Ye, K. (2023). HiCAT: a tool for automatic annotation of centromere structure. Genome Biology, 24(1), 58.
Ye, K. (2022). A commentary of “the intersection of archaeology and genomics: Sparking the advances in cognitive human society”: 10 remarkable discoveries from 2020 in Nature. Fundamental Research, 2(2), 339-340.
Yang, X., Zhao, X., Qu, S., Jia, P., Wang, B., Gao, S., . . . Ye, K. (2022). Haplotype-resolved Chinese male genome assembly based on high-fidelity sequencing. Fundamental Research, 2(6), 946-953.
Xu, T., Yang, X., Jia, Y., Li, Z., Tang, G., Li, X., . . . Ye, K. (2022). A global survey of the transcriptome of the opium poppy (Papaver somniferum) based on single-molecule long-read isoform sequencing. The Plant Journal, 110(2), 607-620.
Wang, T., Xu, C., Zhang, Z., Wu, H., Li, X., Zhang, Y., . . . Ye, K. (2022). Cellular heterogeneity and transcriptomic profiles during intrahepatic cholangiocarcinoma initiation and progression. Hepatology, 76(5).
Wang, T., Xu, C., Xu, D., Yang, X., Liu, Y., Li, X., . . . Ye, K. (2022). Integrating cell interaction with transcription factors to obtain a robust gene panel for prognostic prediction and therapies in cholangiocarcinoma. Frontiers in Genetics, 13.
Wang, T., Dang, N., Tang, G., Li, Z., Li, X., Shi, B., . . . Ye, K. (2022). Integrating bulk and single-cell RNA sequencing reveals cellular heterogeneity and immune infiltration in hepatocellular carcinoma. Molecular Oncology, 16(11), 2195-2213.
Wang, B., Yang, X., Jia, Y., Xu, Y., Jia, P., Dang, N., . . . Ye, K. (2022). High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads. Genomics, Proteomics & Bioinformatics, 20(1), 4-13.
Lin, J., Yang, X., Kosters, W., Xu, T., Jia, Y., Wang, S., . . . Ye, K. (2022). Mako: A Graph-based Pattern Growth Approach to Detect Complex Structural Variants. Genomics, Proteomics & Bioinformatics, 20(1), 205-218.
Lin, J., Wang, S., Audano, P. A., Meng, D., Flores, J. I., Kosters, W., . . . Ye, K. (2022). SVision: a deep learning approach to resolve complex structural variants. Nature Methods, 19(10), 1230-1233.
Gao, S., Yang, X., Sun, J., Zhao, X., Wang, B., & Ye, K. (2022). IAGS: Inferring Ancestor Genome Structure under a Wide Range of Evolutionary Scenarios. Molecular Biology and Evolution, 39(3), msac041.
Dang, X., Kang, Y., Wang, X., Cao, W., Li, M., He, Y., . . . Ye, K., Xu, D. (2022). Frequent exacerbators of chronic obstructive pulmonary disease have distinguishable sputum microbiome signatures during clinical stability. Frontiers in Microbiology, 13.
Che, Y., Yang, X., Jia, P., Wang, T., Xu, D., Guo, T., & Ye, K. (2022). D2 Plot, a Matrix of DNA Density and Distance to Periphery, Reveals Functional Genome Regions. Advanced Science, 9(30), 2202149.
Yang, X., Gao, S., Guo, L., Wang, B., Jia, Y., Zhou, J., . . . Ye, K. (2021). Three chromosome-scale Papaver genomes reveal punctuated patchwork evolution of the morphinan and noscapine biosynthesis pathway. Nature Communications, 12(1), 6030.
Wang, Y., Chen, Z., Wang, T., Guo, H., Liu, Y., Dang, N., . . . Ye, K., Shi, B. (2021). A novel CD4+ CTL subtype characterized by chemotaxis and inflammation is involved in the pathogenesis of Graves’ orbitopathy. Cellular & Molecular Immunology, 18(3), 735-745.
Kang, Y., Ji, X., Guo, L., Xia, H., Yang, X., Xie, Z., . . . Ye, K., Zhao, G. (2021). Cerebrospinal Fluid from Healthy Pregnant Women Does Not Harbor a Detectable Microbial Community. Microbiology Spectrum, 9(3), e00769-00721.
Ebert, P., Audano, P. A., Zhu, Q., Rodriguez-Martin, B., Porubsky, D., Bonder, M. J., . . . Eichler, E. E. (2021). Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science, 372(6537), eabf7117.
Yang, X., Gao, S., Wang, T., Yang, B., Dang, N., & Ye, K. (2020). gCAnno: a graph-based single cell type annotation method. BMC Genomics, 21(1), 823.
Xie, F., Zhang, J., Wang, J., Reuben, A., Xu, W., Yi, X., . . . Ye, K., Cheng, C., Xia, T. (2020). Multifactorial Deep Learning Reveals Pan-Cancer Genomic Tumor Clusters with Distinct Immunogenomic Landscape and Response to Immunotherapy. Clinical Cancer Research, 26(12), 2908-2920.
Wang, Y., Liu, Y., Yang, X., Guo, H., Lin, J., Yang, J., . . . Ye, K., Shi, B. (2020). Predicting the early risk of ophthalmopathy in Graves’ disease patients using TCR repertoire. Clinical and Translational Medicine, 10(7), e218.
Wang, B., Yu, H., Jia, Y., Dong, Q., Steinberg, C., Alabouvette, C., . . . Ye, K., Ma, J., Guo, L. (2020). Chromosome-Scale Genome Assembly of Fusarium oxysporum Strain Fo47, a Fungal Endophyte and Biocontrol Agent. Molecular Plant-Microbe Interactions®, 33(9), 1108-1111.
Guo, L., Ye, K., & Wang, L. (2020). Chromosome-Scale Genome Assembly of Talaromyces rugulosus W13939, a Mycoparasitic Fungus and Promising Biocontrol Agent. Molecular Plant-Microbe Interactions®, 33(12), 1446-1450.
Jia, P., Yang, X., Guo, L., Liu, B., Lin, J., Liang, H., . . . Ye, K. (2020). MSIsensor-pro: Fast, Accurate, and Matched-normal-sample-free Detection of Microsatellite Instability. Genomics, Proteomics & Bioinformatics, 18(1), 65-71.
Guo, L., Ji, M., & Ye, K. (2020). Dynamic network inference and association computation discover gene modules regulating virulence, mycotoxin and sexual reproduction in Fusarium graminearum. BMC Genomics, 21(1), 179.
Fokkens, L., Guo, L., Dora, S., Wang, B., Ye, K., Sánchez-Rodríguez, C., & Croll, D. (2020). A Chromosome-Scale Genome Assembly for the Fusarium oxysporum Strain Fo5176 To Establish a Model Arabidopsis-Fungal Pathosystem. G3 Genes|Genomes|Genetics, 10(10), 3549-3555.
Yang, X., Lee, W.-P., Ye, K., & Lee, C. (2019). One reference genome is not enough. Genome Biology, 20(1), 104.
Liu, B., Yang, X., Wang, T., Lin, J., Kang, Y., Jia, P., & Ye, K. (2019). MEpurity: estimating tumor purity using DNA methylation data. Bioinformatics, 35(24), 5298-5300.
Kang, Y., Yang, X., Lin, J., & Ye, K. (2019). PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction. Genes, 10(2).
Guo, L., & Ye, K. (2019). Mapping Genome Variants Sheds Light on Genetic and Phenotypic Differentiation in Chinese. Genomics, Proteomics & Bioinformatics, 17(3), 226-228.
Guo, L., Winzer, T., Yang, X., Li, Y., Ning, Z., He, Z., . . . Ye, K. (2018). The opium poppy genome and morphinan production. Science, 362(6412), 343-347.
Ye, K., Wang, J., Jayasinghe, R., Lameijer, E.-W., McMichael, J. F., Ning, J., . . . Ding, L. (2016). Systematic discovery of complex insertions and deletions in human cancers. Nature Medicine, 22(1), 97-104.
Kroon, M., Lameijer, E. W., Lakenberg, N., Hehir-Kwa, J. Y., Thung, D. T., Slagboom, P. E., . . . Ye, K. (2016). Detecting dispersed duplications in high-throughput sequencing data using a database-free approach. Bioinformatics, 32(4), 505-510.
Hehir-Kwa, J. Y., Marschall, T., Kloosterman, W. P., Francioli, L. C., Baaijens, J. A., Dijkstra, L. J., . . .Ye, K., Guryev, V. (2016). A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nature Communications, 7(1), 12989.
Sudmant, P. H., Rausch, T., Gardner, E. J., Handsaker, R. E., Abyzov, A., Huddleston, J., . . . The Genomes Project, C. (2015). An integrated map of structural variation in 2,504 human genomes. Nature, 526(7571), 75-81.
Kloosterman, W. P., Francioli, L. C., Hormozdiari, F., Marschall, T., Hehir-Kwa, J. Y., Abdellaoui, A., . . . Ye, K., Guryev, V. (2015). Characteristics of de novo structural changes in the human genome. Genome research, 25(6), 792-801.
Auton, A., Abecasis, G. R., Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., . . . National Eye Institute, N. I. H. (2015). A global reference for human genetic variation. Nature, 526(7571), 68-74.
Niu, B., Ye, K., Zhang, Q., Lu, C., Xie, M., McLellan, M. D., . . . Ding, L. (2014). MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics, 30(7), 1015-1016.
Ye, K., Beekman, M., Lameijer, E.-W., Zhang, Y., Moed, M. H., van den Akker, E. B., . . . Slagboom, P. E. (2013). Aging as Accelerated Accumulation of Somatic Variants: Whole-Genome Sequencing of Centenarian and Middle-Aged Monozygotic Twin Pairs. Twin Research and Human Genetics, 16(6), 1026-1032.
Zhang, Y., Lameijer, E.-W., t Hoen, P. A. C., Ning, Z., Slagboom, P. E., & Ye, K. (2012). PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data. Bioinformatics, 28(4), 479-486.
McVean, G. A., Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., . . . University of, G. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422), 56-65.
Mills, R. E., Walter, K., Stewart, C., Handsaker, R. E., Chen, K., Alkan, C., . . . Genomes, P. (2011). Mapping copy number variation by population-scale genome sequencing. Nature, 470(7332), 59-65.
Ye, K., Jia, Z., Wang, Y., Flicek, P., & Apweiler, R. (2010). Mining Unique-m Substrings from Genomes. J Proteomics Bioinform (0974-276X (Print)).
Durbin, R. M., Altshuler, D., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., . . . The Translational Genomics Research, I. (2010). A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061-1073.
Ye, K., Schulz, M. H., Long, Q., Apweiler, R., & Ning, Z. (2009). Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics, 25(21), 2865-2871.
Ye, K., Vriend, G., & Ijzerman, A. P. (2008). Tracing evolutionary pressure. Bioinformatics, 24(7), 908-915.
Ye, K., Anton Feenstra, K., Heringa, J., Ijzerman, A. P., & Marchiori, E. (2008). Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting. Bioinformatics, 24(1), 18-25.
Ye, K., Kosters, W. A., & Ijzerman, A. P. (2007). An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics, 23(6), 687-693.
Ye, K., Lameijer, E.-W. M., Beukers, M. W., & Ijzerman, A. P. (2006). A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptors. Proteins: Structure, Function, and Bioinformatics, 63(4), 1018-1030.

已授权专利：

施秉银，王悦，刘宇峰，叶凯，杨晓飞，蔺佳栋，T细胞数据处理方法及装置，中国发明，2020.10.9，ZL201810813090.4，（专利权人：一附院）
叶凯，康永永，杨晓飞，贾鹏，蔺佳栋，郭立，一种基于序列模式挖掘算法的系统发生树构建方法，中国发明，2020-11-10，ZL201811408606.2 （专利权人：西安交通大学）
叶凯，贾鹏，杨晓飞，刘博文，康永永，梁皓，一种基于基因组测序的微卫星不稳定性检测系统及方法，中国发明，2020-6-19，ZL201811641480.4 （专利权人：西安交通大学）
杨晓飞，魏宏，叶凯，多组学融合剪接位点的识别方法及系统、设备和存储介质，中国发明，2021-4-30，ZL202110485740.9
杨晓飞，卜楠，叶凯，蔺佳栋，梁皓，郭立，一种基于二代测序数据的反转相关复杂变异检测方法，2020-2-6， ZL202010081979.5
叶凯，蔺佳栋，王松渤，杨晓飞，一种基于深度学习的基因组结构变异检测系统及方法，中国发明，2021-09-29，ZL202111156180.9
叶凯，杨帆，杨晓飞，蔺佳栋，梁皓，郭立，一种基于模式增长算法的基因变异检测方法，中国发明，2020-2-26，ZL202010121579.2
叶凯，梁皓，杨晓飞，杨帆，贾鹏，郭立，基于外显子测序数据的拷贝数变异检测方法及系统、终端和存储介质，中国发明，2020-01-14，ZL 202010038141.8
杨晓飞，孙雨，叶凯，蔺佳栋，段明哲，郭立，面向生物大数据的流式传输与变异实时挖掘系统及方法，中国发明，2019-12-24，ZL201911347153.2

学术专著：

Ye, Kai; Guo, Li; Yang, Xiaofei; Lamijer, Eric-Wubbo; Raine, Keiran; Ning, Zemin; Split-Read Indel and Structural Variant Calling Using PINDEL, Humana Press, New York, NY, 2018.
Ye, Kai; Ning, Zemin; Detecting Break Points of Insertions and Deletions from Paired-end Short Reads, Caister Academic Press, 2014.

var _tsites_com_view_mode_type_=8;

研究成果