Arid
椰枣分子标记数据库的构建
其他题名Construction of data palm molecular biomarker database
张成威
出版年2018
学位类型硕士
导师胡松年
学位授予单位中国科学院大学
中文摘要椰枣树 (Phoenix dactylifera L.) 是人类最早驯化的四大果树之一,属于棕榈科刺葵属多年生木本开花植物,树形呈乔木状,树冠直径达10m,叶直径长达6m,株高达23m,原产于中东沙漠绿洲地区和波斯湾地区,此后被逐渐地移植到热带和亚热带地区。椰枣树在唐代时传入我国,现在主要分布在福建、广州、新疆等省区。椰枣树的果实椰枣是中东和北非地区主要的经济作物和粮食作物, 联合国食品与农业组织的统计结果表明2016年椰枣的全球总产量超过8千万吨。椰枣能够作为制作饮料和果品的原材料,椰枣树的树干和叶子可以用于装饰、建筑木材和工业材料。椰枣树是严格的雌雄异株植物,细胞核中存在18对核染色体,属于XY型性别决定系统,只有雌株能够生产具有经济价值的椰枣,雄株仅作为花粉的供体,所以大型的商业农场为了获得最大的商业价值往往只保留雌性植株,然后采用人工授粉的方式,以此获得单位面积上最大的年产量。椰枣树树龄长达百年,成熟期长达10年。椰枣树品系众多,地域分布广泛,不同的品系之间的果实品质、形状、大小等具有显著的差异。在本研究中,我们利用二代测序数据重拼接了椰枣树的参考基因组。以重拼接的椰枣树基因组作为参考基因组,共鉴定得到246,445个SSRs (Simple Sequence Repeats,简单序列重复) ,随着重复单元中碱基数量的增加,与之相对应的SSRs所占比例呈现递减的趋势,例如单核酸SSRs占比58.92%,双核苷酸SSRs占比29.92%,三核苷酸SSRs占比8.14%。对于鉴定得到的部分SSRs,我们设计出高质量的PCR-Primer (PCR引物) ;此外,我们还注释到5,572,650个高质量的SNPs (Single Nucleotide Polymorphisms,单核酸多态性) ,并利用鉴定的SNPs设计出4,177,778个高质量的SNPs PCR-Primer。基于鉴定得到的SNPs,我们构建了椰枣树62个栽培品系的分子进化树,分析结果表明这些品系被聚成三类,与品系的产地具有很强的关联性,分别被命名为:北非、埃及-苏丹、中东-南亚品系。最后,我们构建了基于网络的现代化、响应式的、用户友好型数据库-DRDB (Date Palm Resequence Database,http:// drdb.big.ac.cn),以方便科研人员和育种专家使用本研究的成果。 在本研究中,我们利用公共数据和实验室自产数据完善了椰枣参考基因组,将所鉴定的高质量SSRs和SNPs、基于全基因组SNP的分子进化树等结果整合成椰枣分子标记数据库,这将为椰枣树的分子育种、品系鉴定、生物多样性、性别决定等研究提供研究基础和数据支持。
英文摘要Date palm (Phoenix dactylifera L.), one of the four earliest domesticated fruit trees, is a perennial woody flowering plant species in the genus phoenix, the palm family. The shape of the tree looks like an arbor and the full span of the crown reaches 10m in diameter. It is thought date palm originated from the Euphrates and the Nile Rivers with its trunk reaching about 23m in height and lefts reaching about 6m in diameter, following by naturalization to tropical and subtropical region. It is mainly planted in Fujian Province, Guangzhou Province, Xinjiang Province et al now, since taken to china in the Tang Dynasty. Dates, the fruit of date palm, are of importance in agriculture and economy. According to the data of UN Food & Agriculture Organization, the worldwide production of dates exceeds 80 million tons in 2016. Dates are used to make drinks and snack food. The stem and leaf of date palm are also useful in ornamentation, architecture and industry. The date palm is dioecious, with 18 pairs of chromosomes in nucleus and XX/XY sex-determination system. Since the males are of value only as pollinators, the modern commercial orchards often keep females only and pollinate manually. The period of juvenility of date palm lasts for 10 years, and the tree are able to live more than 100 years. The quality, color and shape et al of date palms vary depending on various cultivars all over the world. Here, we ?rst improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs) were annotated in this genome assembly; among the SSRs, the proportion of SSRs decreases as the base number of SSR repeat unit increases, for example mononucleotide SSRs accounts for 58.92%, dinucleotide accounts for 29.92% and trinucleotide accounts for 8.14%. The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total) SSRs. We also annotated 5,572,650 SNPs with high confidence. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53%) SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt – Sudan, and Middle East – South Asian. The clusters have strong correlation with the origins of 62 cultivars. All these SSRs, SNPs and their classi?cation can be used for cultivar identi?cation, comparison, and molecular breeding, genetic diversity et al.In recent years, network technology, with fast developing, has changed the way of communication of people and affect bioinformatics revolutionarily. As an ideal carrier for information exchange and sharing, network has become an efficient tool for biological science researchers to display, share and communicate research results. To facilitate the use of these data, we developed a web-based intuitive responsible date palm genome database (DRDB). Here, we reassembled the genome of date palm based on data from our lab and public database. We identified 5,572,650 SNPs and 246,445 SSRs and designed PCR-Primers from these SNPs and SSRs bases on the updated genome. We also reconstruct the SNP-based phylogenetic tree consisting of 62 cultivars. We built a web site named DRDB, which consist of our all analytic results and can be freely visited on drdb.big.ac.cn, to help researcher and breeder to make use of our results.
中文关键词椰枣树 ; 短串联重复 ; 单核苷酸多态性 ; 分子进化树 ; 分子标记数据库
英文关键词Date Palm Short Sequence Repeats Single Nucleotide Polymorphism Phylogenetic Tree Molecular Biomarker Database
语种中文
国家中国
来源学科分类基因组学
来源机构中国科学院北京基因组研究所
资源类型学位论文
条目标识符http://119.78.100.177/qdio/handle/2XILL650/288058
推荐引用方式
GB/T 7714
张成威. 椰枣分子标记数据库的构建[D]. 中国科学院大学,2018.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[张成威]的文章
百度学术
百度学术中相似的文章
[张成威]的文章
必应学术
必应学术中相似的文章
[张成威]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。