Predicting ZIKA virus RNA structure based on novel RNA-RNA interactomic data

The structural flexibility of RNA underlies fundamental biological processes, but there were no methods to explore the multiple conformations adopted by RNAs in vivo. We developed cross-linking of matched RNAs and deep sequencing (COMRADES) for in-depth RNA conformation capture, and a pipeline for the retrieval of RNA structural ensembles.

Using COMRADES, we determined the architecture of the Zika virus RNA genome inside cells and identified multiple site-specific interactions with human noncoding RNAs.

Hi, my name is Dr. Marta Gabryelska and I’m a postdoctoral researcher at the Flinders University in Australia. This work was done in the University of Edinburgh in collaboration with the University of Cambridge and published in 2018. We were really interested into predicting how Zika virus RNA genome looks like based on RNA-RNA interactomic data. Zika virus belongs to the family of flaviviruses. The name of the family comes from Latin where flavus means yellow, and yellow fever virus belongs to this family. These viruses are transmitted through mosquito bites, and it contains dengue virus, yellow fever virus, Zika virus and West Nile virus. Zika virus became infamous around 2015, originally isolated from Africa in 1940s in Uganda next to Ziika lake. This is where the name is coming from. But transmitted by mosquitoes, it traveled between the continents and became a concern around 2015 when it was found that Zika infection is causing microcephaly in fetuses, so it’s very dangerous for pregnant women. What was known about the Zika virus genome is that it is one long piece of single stranded RNA that is capped, that it is transformed into sub genomic particles, that it is used to be packed into capsides, that it is used as a template for complimentary RNA strand. And it’s also used for translation. So instruction must be very dynamic as it performs many different functions. What was also known is that the Zika virus genome needs to go through cyclization process in order to perform replication. And in cyclization, there are these red, green and blue regions involved, that form complimentary regions, forming base pairing and allowing this process to happen. That is why Omer Ziv from University of Cambridge became very interested in the Zika virus, he developed COMRADES. It’s a method called crosslinked of matched RNAs and deep sequencing. This method uses psoralen, which is a planar molecule that intercalates into double stranded nucleic acids. What Omer additionally did was he introduced double selection for RNA. First, after cells are infected with the virus and exposed to psoralen and the crosslinking is happening due to UV exposure, RNA is isolated and goes through two pull downs. First pull down with biotinylated probes complimentary to the viral RNA ensures virus RNA pulldown. Next, the psoralen molecule is being biotinylated. And therefore there is another pulldown allowing enrichment in only crosslinked molecules. Next steps include digestion, ligation of crosslinked strands, and cross link reversal at the end. We also utilize the standard controls with crosslink reversal and ligation afterwards. RNA was sent for sequencing and sequencing data were analyzed with a hyb pipeline. This pipeline through many different steps allows to recover a list of RNA-RNA interactions. We saw that there is a high level of chimeras in our samples and low level in our controls. We also noticed that there is a good proportion of ribosomal-ribosomal RNA chimeras and mitochondrial- mitochondrial, but very low level of mitochondrial-ribosomal suggesting that there is a low level of artifacts in our data. These chimeras could be represented and in such way as this plot, and each dot represents one interaction along the Zika virus genome which is almost 11,000 nucleotides long. We could see that the replicas are very similar to each other and interactions detected are not present in our controls. When we were looking at analysis of data, we observed that the most, the highest amount of interactions is spread within 1000 nucleotides. Therefore, because there is a difficulty of folding long RNA with good accuracy, we divided the Zika virus genome into 10, around 1000-nucleotide long pieces that will allow us to obtain and retain the most of the valuable information. Data analysis starts with sequencing data going for hyb pipeline and giving a list of chimeras. These chimeras can be shown as you’ve seen before, as this sort of heat map, in which the intensity of the dot represents the amount of chimeras detected in the sequencing data. Here, you can see interactions along the Zika virus genome. And you can see that the most of interactions are very close to each other, as you saw also on the previous slide, but we also detect interactions between five prime and three prime ends, which you can see here, which confirms the known cyclization of the Zika virus genome. We use the fact that hyb gives us lots of information about the chimeras, it allows folding of chimeras to detect the base pairs which are very likely to occur within the chimera. So we performed this minimal free energy folding for little pieces of RNA and recovered basepair frequency, which basically means that we could discover how many reads support each of the base pairs embedded in our data. These base pairs could then be transformed into constraints for the folding program. And the constraint is basically a limitation for it that is forcing, for example, here base pairing between these two regions, and there are four base pairs here, like, it comes from this side. So we wanted to predict the structure of the Zika virus genome. We had a list of constraints that need to be considered. So we wrote a program that was considering and reading one interaction at a time and adding to the folding constraints. If folding was possible, it was done, the next constraint was taken. However, we observed that some constraints are excluding each other, they cannot exist at the same time. So we had to tell the program to just get rid of this one and take the next one. But this way, we would lose quite a lot of information. So what we did, we fold the Zika virus RNA 1000 times. But each time, the order of constraints was different, they were shuffled, it gave the possibility that each of the constraints can be the first on our list, and guide how the folding will occur. This way, we could retain all the information needed. In the end, we could see how the piece of Zika virus genome can look like, we could also check what is the support for each base pair of that structure. Each structure was characterized by a data score, which is a sum of reads supporting each of the base pairs of this structure taken together. So we folded each piece of Zika virus genome 1000 times and we plotted it on a two dimensional space. And this way we could compare them to each other. We clearly saw that there’s some sort of clustering occurring that there are clusters of similar structures that are highly supported by our data. So the data support is shown here. The more red is each structure represented by a circle, the higher the data support, and the bigger the circle, the more stable is the structure. Looking at different pieces of Zika virus genome, we observe the same sort of clustering occurring and when we were actually looking at examples of these structures, we saw that the same piece of RNA can fold differently and be supported highly by our data. So it clearly confirmed a dynamics of the Zika virus genome within the cells. We could also see that minimal free energy folding for these pieces of RNA differ from the folding based on our data. You can see here at this example, the structure based on constraints looks much different than the minimal free energy structure of the same piece of Zika virus genome. And that minimal free energy often has much less support in the data meaning, it really doesn’t exist, or it’s very unlikely to exist in the cells. So we compared the data support and minimum free energy for the pieces of Zika virus genome with constraints or just based on minimum free energy. Minimum free energy structures were always more stable. But the constraint using structures had always higher data support. We also characterize each nucleotide of the Zika virus genome by its entropy. And this entropy was describing the probability of this nucleotide being paired with other nucleotide in the structure. So some nucleotides had high entropy, they were more involved in many different interactions, and some of them had low entropy. And when we compared the nucleotides with a low entropy, they were having the highest support in our data, meaning that we discovered which pieces of Zika virus genome are the most stable and less involved in interactions. So in the end, we could present this structure of the Zika virus genome. Just worth mentioning that it’s not the one and only structure. It’s just one of many possible confirmations of Zika virus genome. But this one is special, because it is has the highest support in our data, as you can see with the high amount of red stems. We also additionally applied long distance interactions that had to be initially excluded. And it was good to see that they are belonging in STEM loops or not supported regions. The same structure of the Zika virus genome could be presented as this arc plot. And on the top of the output, you can see the interactions that were predicted with this methodology. And at the bottom, you can see interactions with minimal free energy folding. And you can see that some of them are similar, but some of them are quite different. And what is also relevant is that we had low amount of interactions within five prime and free prime UTRs. But when we look closer at these interactions that we could actually recover, which are just at a bit lower level, you can see that they are definitely different from predicted interactions. And then, as we were considering what is the reason, why we don’t have such coverage for the UTRs? It could be either due to cyclization, which is mainly the cause or involvement in translations. However, we could also recover and find support for pseudoknots which are known to occur in UTRs of the Zika virus genome. With our data, we could see that regions which are known to be involving cyclization, which are shown here, these free ones can be extended, and we define new regions involved in this process. We also could see specific interaction was unknown for now, which is this little black dot, which involves the five prime UTR of the Zika virus genome. But it might be problematic for forming the cyclization structure. And this is where this interaction is occurring. So we could predict that and, and show it to you. In the end, we were also looking at the interactions between the Zika virus genome and the host meaning human cells. And we could see high level of interactions with micro RNAs and trnas sunwolves RNAs, we could see that the most dominant is interaction with mir 21, and it doesn’t occur in our controls at all. We found that the region that mute when she was interacting with his with within the cycle Session region of the Zika virus genome. And when performing can knock out of the mu 21. or using inhibitor, we were decreasing the replication of the virus. So, we could define for the first time a region of RNA which has triple function cyclization sacralization region sequence is involved in translation in cyclization of the viral genome and involved in the interaction with a human micro RNA 21. So, just to summarize, commerce data is highly reproducible, we can clearly show the difference between data and the controls, and it allows higher enrichment in target RNA. Our computational on those is allow the prediction of the structure of the large super large RNA, RNA and detection of RNA alternative confirmations. And our scoring system allowed us to find support for the any structure of interest. As for the RNA RNA interactions during viral infections, there was high level of viral viral interactions, but also our interactions with micro RNA of humans, especially mir 21. I hope you liked this talk. And if you’re still interested in it, please check out our publication in nature methods. Thank you for watching with STEMcognito. Find more videos using the search box or the drop down menus above. If you think there’s something wrong with this video, please use the Report button to inform the STEMcognito team. Questions about the video content should be directed to the researcher. You can find their details below. Go to our submission pages to find out how to submit your own video and don’t forget to follow us on social media.

A video by:

Female researcher badge2

How to cite this video

This video consists of the following chapters:
0:00 Introduction
2:20 Methodology and data analysis
5:20 RNA structure prediction strategy
9:30 RNA structure clustering
11:40 The structure of the Zika virus genome
14:30 Zika virus – host RNA interactions
15:33 Conclusions

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.