SCREENING AND COMPARING GENES OF INTEREST IN MICROBIAL SPECIES USING WHOLE GENOME SEQUENCING
BUTLER, ROBERT RAYMOND III
MetadataShow full item record
This study utilized the latest advances in whole genome sequencing, assembly and annotation to develop high quality curated genomes, which were compared to related organisms with differential traits to identify or characterize the trait-associated genes. Additionally, we were able to infer potential origins of these traits, and present gene targets for further study. Here we examined two biological phenomena: the desulfurization capability of a Paenibacillus species, and the exceptionally high spore heat resistance of Clostridium sporogenes PA 3679. Microorganisms with the capability to desulfurize petroleum are in high demand with escalating restrictions currently placed on fuel purity. Thermophilic desulfurizers are particularly valuable in high temperature industrial applications. A culture containing Paenibacillus naphthalenovorans 32O-Y and Paenibacillus sp. 32O-W was isolated by repeated passage of a soil sample at up to 55°C in medium containing dibenzothiophene (DBT) as sulfur source. Only 32O-Y metabolized DBT, apparently via the 4S pathway, however 32O-W enhanced DBT metabolism by 32O-Y in a mixed culture. Genome sequencing identified desulfurization gene homologs in the strains consistent with their desulfurization properties, with 32O-W lacking homologs for two necessary components of the 4S pathway. Both 32O-Y alone and the 32O-Y/32O-W mixed culture may be useful in development of an improved thermophilic petroleum biodesulfurization process. Clostridium sporogenes PA 3679 is a nonpathogenic, nontoxic model organism for proteolytic Clostridium botulinum used in the validation of conventional thermal food processes due to its ability to produce highly heat-resistant endospores. Because of its public safety importance, the uncertain taxonomic classification and genetic diversity of PA 3679 are concerns. Therefore, isolates of C. sporogenes PA 3679 were obtained from various sources and characterized using pulsed-field gel electrophoresis (PFGE) and whole-genome sequencing. The phylogenetic relatedness and genetic variability were assessed based on 16S rRNA sequence and whole-genome single nucleotide polymorphism (SNP) analysis. All C. sporogenes PA 3679 isolates were categorized into two clades. Clade I C. sporogenes isolates were genetically distinct from clade II isolates, and thermal destruction studies revealed that clade I isolates were more sensitive to high temperature than clade II isolates; clade II demonstrating the typical phenotype of PA 3679. A pan-genomic analysis of clade I and clade II isolates identified genes associated with PA 3679’s exceptional heat resistance. The most significant difference was the acquisition of a second spoVA operon, spoVA2, whose products are responsible for dipicolinic acid transport into the spore core during sporulation. The small acid-soluble spore protein ssp4 potentially plays a role in spore heat resistance, though further exploration is needed. spoVA2 was also found in some C. botulinum species clustering phylogenetically with PA 3679. Most other C. sporogenes examined both lack the spoVA2 locus and are phylogenetically distant within the group I Clostridia, adding to the understanding that C. sporogenes are dispersed C. botulinum strains lacking toxin genes. C. sporogenes strains are thus a very eclectic group, and few strains possess the characteristic heat resistance of PA 3679. Analysis from both Paenibacillus and Clostridium models revealed some interesting insights into genomic analysis that extrapolate to other projects. Each of the four generations of sequencing technology has remained a necessary component of genomics. The delineating factors for which sequencing tool to use depends heavily on the application they are being used for. New software for assembly and annotation are developed and released daily, and the challenge has become deciding which tools are actually an improvement over existing methodology. In order to best facilitate the large amount of genomic data in need of analysis, pipelines that are consistent and comprehensive are of higher value. Our studies identified many useful tools for future comparative analysis, and explored some novel ways to represent data in a visually appealing manner. As these tools and new ones continue to be developed, the value of genomics will increase with the new insights it provides.