28  Refactorization of Legacy Code

Author
Affiliation

Dr Randy Johnson

Hood College

Published

April 2, 2026

Introduction

Most DNA is made up of the standard, right-handed DNA we are used to working with, but there are a number of alternate conformations that don’t follow this paradigm. This is referred to as non-B DNA.

Z-DNA is more likely in regions with alternating purine-pyrimidine sequences (De Rosa et al. 2010). Image courtesy of Gemini

G-Quadruplexes form in sequences rich in Guanine. Four Guanine bases can bond to form a square plate; several of these can stack to form a quadurplex (Varshney et al. 2020). Image courtesy of Gemini.

Cruciform complexes form in palindromic sequences. Complementary strands pull apart and fold back on themselves, creating a characteristic cross-like shape (Brázda et al. 2011). Image courtesy of Wikipedia.

Slipped-Strand DNA structures occur mostly commonly at repetitive sequences. The strands may misalign during replication and cause bulging loops (Pearson 1998). Image courtesy of Gemini.

non-B_gfa

The non-B_gfa database and accompanying package was developed to find sequences associated with non-B DNA forming motifs. It was originally published by Cer et al. (2012), and the current version is written in C.

It hasn’t recieved much attention in the past decade. We want to modernize the code and make it easier to incorporate into R and Python workflows.

Instructions

  • Fork a copy of the non-B_gfa repository using this GitHub classroom link
  • Clone your fork to your local machine
  • Refactor the code into either an R or Python package, including an updated README, tests and documentation
  • Push your changes to GitHub and submit a link to your fork on Blackboard

References

Brázda, Václav, Rob C. Laister, Eva B. Jagelská, and Cheryl Arrowsmith. 2011. “Cruciform Structures Are a Common DNA Feature Important for Regulating Biological Processes.” BMC Molecular Biology 12 (August): 33. https://doi.org/10.1186/1471-2199-12-33.
Cer, Regina Z., Duncan E. Donohue, Uma S. Mudunuri, Nuri A. Temiz, Michael A. Loss, Nathan J. Starner, Goran N. Halusa, et al. 2012. “Non-B DB V2.0: A Database of Predicted Non-B DNA-Forming Motifs and Its Associated Tools.” Nucleic Acids Research 41 (D1): D94–100. https://doi.org/10.1093/nar/gks955.
De Rosa, Matteo, Daniele De Sanctis, Ana Lucia Rosario, Margarida Archer, Alexander Rich, Alekos Athanasiadis, and Maria Armenia Carrondo. 2010. “Crystal Structure of a Junction Between Two Z-DNA Helices.” Proceedings of the National Academy of Sciences 107 (20): 9088–92. https://doi.org/10.1073/pnas.1003182107.
Pearson, C. 1998. “Structural Analysis of Slipped-Strand DNA (S-DNA) Formed in (CTG)n. (CAG)n Repeats from the Myotonic Dystrophy Locus.” Nucleic Acids Research 26 (3): 816–23. https://doi.org/10.1093/nar/26.3.816.
Varshney, Dhaval, Jochen Spiegel, Katherine Zyner, David Tannahill, and Shankar Balasubramanian. 2020. “The Regulation and Functions of DNA and RNA G-Quadruplexes.” Nature Reviews Molecular Cell Biology 21 (8): 459–74. https://doi.org/10.1038/s41580-020-0236-x.