Welcome! Orca-Fly is a deep learning sequence model framework for multiscale ultra-resolution genome structure prediction in Drosophila embryonic cells. Building upon the high-resolution, temporally resolved 3D genome maps generated using Pico-C, Orca-Fly can predict genome interactions in 125bp, 250bp, 500bp, and 1kb resolution using only genomic sequence as input. Orca-Fly allows predicting genome structural impacts of any genomic variants, including very large structural variants, or designing virtual genetic screens to probe the sequence basis of genome 3D organization. You can find the Github repo here and the publication here.
Predict the genome structural impacts of Deletion, Duplication, and Inversion, including large structural variants.
Analyze sequence dependencies of genome 3D structure by performing virtual genetic screens. Orca-Fly sequence models can serve as an “in silico genome observatory” that allows designing and performing virtual genetic screens to probe the sequence basis of genome 3D organization.
Orca-Fly is a deep learning sequence modeling framework for multiscale genome interaction prediction. Orca-Fly models are trained on high-resolution pico-C datasets for Drosophila embryonic cells at stage NC12 and NC14. If you have sufficient computational resources including GPUs, you can also train your own models on 3D genome data given any cooler format input following our examples (see the training section of the code repository).
This webserver provides an user-friendly interface to many of Orca-Fly's prediction capabilities, including predicting multiscale genome 3D organization effects of structural variants. You can also use Orca-Fly with the code provided at our Github repository, which provides the full functionalities such as supporting more complex variants or any input sequence. You can also find more information and resources about Orca-Fly from the repository.
In the Orca home page, you can select a prediction mode and provide the corresponding input
information, then submit the job to our job queue. An example input is provided as a reference for the input format
for any prediction mode that you select. Here we list the required input information for all prediction modes that
we currently support in the webserver. All coordinates should be in dm6, 0-based, inclusive for the start
coordinate and exclusive for the end coordinate.
Genomic Region - Predict multiscale genome interactions centered at the specified genomic region from sequence and compare with experimental observations (pico-C). An example input is chr3R:6740000-7040000.
Structural Variant - Deletion - Predict the genome structural effects of the deletion of an genomic interval. The genomic interval must be specified in dm6. An example input is chr3R:6875250-6906500.
Structural Variant - Duplication - Predict the genome structural effects of the duplication of an genomic interval. An example input is chr3R:6875250-6906500.
Structural Variant - Inversion - Predict the genome structural effects of the inversion of an genomic interval. An example input is chr3R:6875250-6906500.
As an example output, here we showed visualizations generated for the predictions of a duplication variant. For structural variant prediction, Orca-Fly generates one (genomic region prediction) or multiple (structural variant prediction) files that each contains a series of multi-level predictions zooming into a breakpoint of the variant, or the corresponding position(s) of the breakpoint in the reference sequence.
For all prediction modes, predicted interaction matrices at multiple scales (125bp, 250bp, 500bp, and 1kb) are visualized with heatmaps, where each pixel represents the interaction between a pair of genomic positions. The interaction scores are represented by log fold over the distance-based background scores (log being natural logarithm). The distance-based background is the expected contact score based on the genomic distance (available from our code repository). We also visualize the observed pico-C data side-by-side for comparison whenever appropriate.
In addtion to the visualizations in pdf format, the results page also allows downloading the numerical predictions in PyTorch serialization format with extension '.pth'. The .pth file can be loaded with torch.load. Each file contains a python dictionary. If the prediction mode is one of the structural variant prediction modes, the dictionary stores multiple dictionaries each corresponding to an output file as described above. The dictionary includes:
predictions - Multi-level predictions for NC12 and NC14 cell types.
experiments - Observations for NC12 and NC14 cell types that matches the predictions (only available for reference allele).
chr - The chromosome name
start_coords - Start coordinates for the prediction at each level.
end_coords - End coordinates for the prediction at each level.
Thank you for using Orca-Fly. If you have any question or feedback, you can let us know at our user email group [email protected].