Arid
DOI10.1186/s13015-017-0091-2
Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads
Lima, Leandro1,2; Sinaimeri, Blerina1,2; Sacomoto, Gustavo1,2; Lopez-Maestre, Helene1,2; Marchet, Camille3,4; Miele, Vincent2; Sagot, Marie-France1,2; Lacroix, Vincent1,2
通讯作者Lima, Leandro
来源期刊ALGORITHMS FOR MOLECULAR BIOLOGY
ISSN1748-7188
出版年2017
卷号12
英文摘要

Background: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them.


Results: The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7): 644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8): 1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12): 553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8): 1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.


英文关键词Transcriptome assembly RNA-seq Repeats Alternative splicing Formal model for representing repeats Enumeration algorithm De Bruijn graph topology Assembly evaluation
类型Article
语种英语
国家France
收录类别SCI-E
WOS记录号WOS:000396060700001
WOS关键词GENOME ; ELEMENTS
WOS类目Biochemical Research Methods ; Biotechnology & Applied Microbiology ; Mathematical & Computational Biology
WOS研究方向Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Mathematical & Computational Biology
资源类型期刊论文
条目标识符http://119.78.100.177/qdio/handle/2XILL650/197220
作者单位1.Inria Grenoble, 655 Ave Europe, F-38334 Montbonnot St Martin, France;
2.Univ Claude Bernard Lyon 1, UMR5558, CNRS, 43,Blvd 11 Novembre 1918, F-69622 Villeurbanne, France;
3.Univ Rennes 1, IRISA Inria Rennes Bretagne Atlantique, 263 Ave Gen Leclerc, F-35042 Rennes, France;
4.Univ Rennes 1, GenScale Team, 263 Ave Gen Leclerc, F-35042 Rennes, France
推荐引用方式
GB/T 7714
Lima, Leandro,Sinaimeri, Blerina,Sacomoto, Gustavo,et al. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads[J],2017,12.
APA Lima, Leandro.,Sinaimeri, Blerina.,Sacomoto, Gustavo.,Lopez-Maestre, Helene.,Marchet, Camille.,...&Lacroix, Vincent.(2017).Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.ALGORITHMS FOR MOLECULAR BIOLOGY,12.
MLA Lima, Leandro,et al."Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads".ALGORITHMS FOR MOLECULAR BIOLOGY 12(2017).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Lima, Leandro]的文章
[Sinaimeri, Blerina]的文章
[Sacomoto, Gustavo]的文章
百度学术
百度学术中相似的文章
[Lima, Leandro]的文章
[Sinaimeri, Blerina]的文章
[Sacomoto, Gustavo]的文章
必应学术
必应学术中相似的文章
[Lima, Leandro]的文章
[Sinaimeri, Blerina]的文章
[Sacomoto, Gustavo]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。