欢迎访问《植物研究》杂志官方网站,今天是 分享到:

植物研究 ›› 2017, Vol. 37 ›› Issue (6): 825-834.doi: 10.7525/j.issn.1673-5102.2017.06.004

• 研究报告 • 上一篇    下一篇

基于高通量测序的极小种群野生植物长梗杜鹃转录组分析

李太强, 刘雄芳, 万友名, 李正红, 起国海, 李钰莹, 刘秀贤, 和锐, 马艳, 马宏   

  1. 中国林业科学研究院资源昆虫研究所, 昆明 650233
  • 收稿日期:2017-03-29 出版日期:2017-11-15 发布日期:2017-11-25
  • 通讯作者: 马宏 E-mail:hortscience@163.com
  • 作者简介:李太强(1993-),男,硕士研究生,主要从事杜鹃属植物保护生物学研究。
  • 基金资助:
    “云南省技术创新人才”培养对象项目(2016HB007)

Transcriptome Analysis for Rhododendron longipedicellatum (Plant Species with Extremely Small Populations) Based on High Throughput Sequencing

LI Tai-Qiang, LIU Xiong-Fang, WAN You-Ming, LI Zheng-Hong, QI Guo-Hai, LI Yu-Ying, LIU Xiu-Xian, HE Rui, MA Yan, MA Hong   

  1. The Research Institute of Resource Insects, Chinese Academy of Forestry, Kunming 650233
  • Received:2017-03-29 Online:2017-11-15 Published:2017-11-25
  • Supported by:
    The Cultivation Project of Technology Innovation Talent of Yunnan Province

摘要: 为加强特有濒危植物长梗杜鹃资源的评价、保护和鉴定工作,及为今后遗传育种和改善其农艺性状提供有益参考。本研究采用新一代高通量测序平台Illumina HiSeq 4000对其转录组测序,得到的数据过滤后进行de novo组装并聚类去冗余,获得74 092个Unigenes,平均长度、N50、Q20、Q30以及GC含量分别为938 nt、1 616 nt、98.22%、95.20%和43.24%,其中1 Kb以上的Unigenes有23 879条。通过与七大功能数据库比对,分别有39 876(NR:53.82%)、38 065(NT:51.38%)、27 384(Swissprot:36.96%)、16 099(COG:21.73%)、30 401(KEGG:41.03%)、17 518(GO:23.64%)以及29 676(Interpro:40.05%)条Unigenes获得功能注释。长梗杜鹃转录组中的Unigenes根据GO功能大致可分为生物学过程、细胞组分和分子功能3大类56亚类,其中执行生物学过程的基因最多,占41.53%;与COG数据库比对,根据其功能大致可分为25类;以KEGG数据库为参考,可归为6个代谢通路大类、32类代谢途径,并发掘出176条与人类疾病相关的Unigenes,包括内分泌及代谢疾病(167条)和抗药性(9条)。根据注释结果共检测出39 418个CDS,未注释上的Unigenes使用ESTScan预测后获得3 194个CDS。同时,预出到1 488个编码TF的Unigenes,以及检测到57 927个SNP多态位点。该转录组分析为今后长梗杜鹃乃至杜鹃属植物功能基因挖掘与利用、基因克隆、抗性机理分析、遗传资源分类和进化、分子标记开发以及分子辅助育种等研究提供了基础数据和重要参考。

关键词: 长梗杜鹃, 转录组, Unigene, 功能注释, 编码序列, 转录因子, 单核苷酸多态性

Abstract: To strengthen the research of resources evaluation, protection and identification of endemic and endangered species of Rhododendron longipedicellatum, and to provide a helpful reference for genetic breeding and improvement of its agronomic traits, the transcriptome was sequenced by using Illumina Hiseq 4 000, in total, 74 092 Unigenes with an average length of 938 nt, N50 of 1616 nt, Q20 of 98.22%, Q30 of 95.20% and GC content of 43.24% were obtained by de novo assembly and cluster with filtered data, and there were 23 879 Unigenes with more than 1 kB. Then, the Unigenes were annotated by 7 functional databases, and finally, 39 876(NR:53.82%),38 065(NT:51.38%), 27 384(Swissprot:36.96%), 16 099(COG:21.73%), 30 401(KEGG:41.03%), 17 518(GO:23.64%), and 29 676(Interpro:40.05%) Unigenes were annotated. The Unigenes were roughly divided into three functional categories(i.e. biological processes, cellular components and molecular function) and 56 sub-categories according to GO function. Most of the genes performed biological processes. KEGG functional annotation analysis showed that Unigenes could be grouped into 6 categories, 32 metabolic pathways. 176 Unigenes relating to human diseases were detected, including endocrine and metabolic diseases(167) and antimicrobial resistance(9). The 39 418 CDS were detected with functional annotation results, and after 3 194 CDS also were predicted by ESTScan with the remaining Unigenes. The 1 488 Transcription Factor(TF) coding Unigenes were predicted and 57 927 SNP polymorphic loci were detected. The analysis of the transcriptome could lay a foundation for further study of functional gene discovery and utilization, resistance mechanism analysis, classification and evolution of genetic resources, molecular marker development and molecular assisted breeding of R.longipedicellatum and other congeneric species.

Key words: Rhododendron longipedicellatum, transcriptome, Unigene, gene annotation, CDS, Transcription Factor, SNP

中图分类号: