There is emerging evidence that lncRNAs can be involved in various critical biological processes. However, our understanding on lncRNA is still at the rudimentary stage. Zebrafish is a full-developed model system being used in a variety of basic research and biomedical studies. Hence, it is a good idea to study the role of lncRNA using zebrafish as a model. Here, we constructed ZFLNC -- a comprehensive database of zebrafish lncRNA that is dedicated to offering a zebrafish-based platform for deeply exploring lncRNA function and mechanism, to the relevant academic community.
The principal data resources of lncRNAs in this database come from NCBI, Ensembl, NONCODE, zflncRNApedia and literature. We also obtained lncRNAs as a supplement by analyzing RNA-Seq datasets from SRA database. We carried out the expression profile, GO annotation, KEGG pathway annotation, conservative analysis and OMIM annotation for those zebrafish lncRNAs. In the current version ZFLNC contain 13,604 lncRNA genes and 21,128 lncRNA transcripts. To our best knowledge, ZFLNC should be the most comprehensive and well annotated database for zebrafish lncRNA.
We obtained 7394 zebrafish lncRNA genes (13166 transcripts) from RNA-Seq data, and then integrated them with those from Ensembl, NONCODE, NCBI, zflncRNApedia and literature. Our final zebrafish lncRNA set contains 13604 lncRNA genes (21128 transcripts).
RNA-Seq data were downloaded from NCBI SRA database. SRA format files were dumped to FASTQ format files by SRA-Toolkit. Low quality reads were trimmed by Trimmomatic. RNA-Seq reads were mapped to Zv9 genome using Tophat2, then transcript reconstruction by Cufflinks. Transcripts less than 200nt are filtered out. Then lncRNA was identified by CPC and CNCI.
All sources of zebrafish lncRNA were integrated together using the Cuffcompare program in the Cufflinks suite. Zebrafish lncRNA and coding gene were quantified by Cuffnorm program in the Cufflinks suite. According to the gene expression which was scaled by up-quartile, we calculated the Spearman correlation coefficient between gene-pairs using the in-house Perl script.
GO and KEGG annotation:
Zebrafish lncRNA GO annotation used the goatools. Zebrafish lncRNA KEGG pathway annotation used the in-house Python script.
We used three methods to find the counterpart of zebrafish lncRNA in human or mouse; that are direct blastn, collinearity with conserved coding gene, and overlap with multi-species ultraconserved non-coding elements. Finally, we obtained 676 zebrafish lncRNA genes that have the counterparts in human or mouse. Their phastCons scores are above the average of all zebrafish lncRNAs.
We used the RWRH (random walk with restart on heterogeneous network) algorithm to analyze the relationship between lncRNA and OMIM in MATLAB. The upper subnetwork is coding-lncRNA gene co-expression network, and the lower network is OMIM similarity network. The OMIM similarity matrix is from Disimweband, and the gene-OMIM relationship is from InterMine.
In Browse, you can browse all lncRNA genes or transcripts. LncRNA is sorted according to the richness of its annotation.
In GBrowser, you can view lncRNA-related genomic annotation, such as mRNA, conserved non-genic elements, genome variation, or miRNA.
In Quick Search, you can use lncRNA ID which can be either ZFLNC ID or other database ID to search.
In Advanced Search, you can find the interested lncRNA in according to its expression tissue, co-expressed coding gene and biological function annotation (Go, or KEGG, or OMIM annotation).
Before ZFLNC, zebrafish lncRNAs were dispersed in different databases. Hence, we provided ID conversion to facilitate the use among diverse databases. You can use BLASTN through sequence similarity, or Gbrowser through sequence positions to convert the lncRNA in other sources into ZFLNC.