In recent years, with the development of linear genomes, researchers have used fluorescence in situ hybridization (FISH) and chromosome conformation capture (3C) to gain a deeper understanding of the three-dimensional (3D) conformation and content of chromatin. Afterward, researchers fused 3C technology with gene chip technology to produce 4C (chromosome conformation capture-on-chip) and 5C (chromosome conformation capture carbon copy) technology.
Hi-C technology is a derivative of 3C, including but not limited to 3C, 4C and 5C technologies, and is the most comprehensive and high-throughput tool that measures how all genomic sequences are organized in three-dimensional space. The successful development of Hi-C technology has greatly facilitated the study of high-level chromatin conformation, and the 3D structures of a series of species such as human, mouse, yeast, Arabidopsis, Drosophila, E. coli, and rice have been gradually reported. The technology is not only used to study the global structures of species, but can also be used to comparatively analyze the differences in 3D conformation of multiple samples.
The development of 3C, 4C, and 5C technologies has gradually recognized the importance of high-level conformation for the regulation of genomic expression, which has also greatly promoted the development of this field, and how to develop a technology that enables the mapping of genomic global interactions has become a hot topic.
With the popular application of Hi-C technology, a large amount of global interaction information has been mined, but the problem of poor data utilization and low resolution for some specific site studies using global methods has been highlighted. Thus, it is quite important to handle large-scale Hi-C datasets.
The Workflow of Hi-C Sequencing
- Fix the cells with formaldehyde to cross-link DNA to protein
- Enzyme digestion (e.g., Hind III or other restriction endonucleases) to create sticky ends on both sides of the cross-links
- End repair, biotinylation and ligation
- Reverse the cross-linking, DNA extraction, interrupt, capture biotin-labeled fragments, and library construction
- Hi-C data analysis
- Data filtering and screening
- Alignment. There are two main ways of alignment, one is to determine whether each reads contains enzyme cutting site, and if it does, the data after removing the cutting site will be compared with bowtie2; the other is to use the strategy of single end alignment, using 25bp as the starting length and increasing 5bp each time until the reads are unique compared to the genome.
- Mapping the interaction information from valid pairs to the genomic positions, eventually translates into the interaction strength of each two bins.
- Interaction matrix normalization. The normalization methods are mainly divided into two categories, one is based on matrices, which are mathematically normalized, such as iteration, etc., and the other is based on biological standardization (e.g., mappability).