The application of metagenomics in hydrocarbon resource management

The last 5-10 years has witness a new proven field of research where explanation have been provided to non-cultured microbes. This uncultured microorganisms forms the major group of organisms found in most environment of the Earth. The science of metagenomics makes it possible to investigate resources which can be used to develop new enzymes, genes and several chemical compounds for use in biotechnology. Studies of microorganisms in pure laboratory culture for over a century have led to significant advances into microbial genetics and physiology, biotechnology and molecular biology. The rapid advancement in sequencing technology has brought about drastic reduction cost of sequencing thereby leading to increasing sequencing project been undertaken. This advancement has provided the privilege for the continual use of this sequencing technology to monitor microbes in the environment which before now are not available. While metagenomic applications have been used to consistently have a better understanding of ecology and microbial diversity, it is pertinent to note that its application in environmental monitoring and application is commonly increasing and has been one of the research areas in focus. To this end this article seek to provide a general overview of what metagenmics is, its principle and application in hydrocarbon resource management.


Introduction
Advances in microbial physiology and genetics, molecular biology and biotechnology over the past century have yielded positive result due to the continual studies of microorganisms in pure laboratory culture. In spite of these advancement quite a vast majority of bacteria have not or may be cannot be cultured under laboratory conditions. The habitat of microorganisms and their association is such a complex one (e.g. soil, sewage, contaminated sites). This complexity has made it extremely difficult to replicate such conditions on a Petri dish. This challenge and limitation has led to in the last decade the most recent novelty that has assisted in circumventing this problem. The collective DNA contained in microbial communities is studied. This technology is termed "metagenomics".
Metagenomics is the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample. The term simply put is the analysis of the genome of a microbial population. Modern genomic technique is applied without the need for isolation and lab cultivation of individual species. The word metagenomic was coined by Handelsman et al. (1998) with the intention to analyze a collection of unidentical but similar items, as in the statistical concept of meta-analysis (Riesenfeld et al., 2004;Baquiran, 2010;Simon et al., 2011;Thomas et al., 2012).

Principle of metagenomics
In metagenomics, the entire microbial communities is subjected to genomic analysis, without the routine isolation and culturing of individual bacteria within the community. The main principle in metagenomics involves the extraction and sequencing of DNA from the microbial community by extracting directly bulk DNA from environmental samples, which can then be used to discover new enzymes and genes or to assemble catabolic pathways or even entire genomes. Several different strategies could be adopted when analyzing an environment using metagenomic approaches. However, every analysis must begin with the DNA being extracted from its environment. Depending on the strategy, the DNA fragments can range in size from as small as 1 kb to over 100 kb. These fragments are then ligated into suitable vectors (e.g. cosmids, fosmids, BACs) then cloned into appropriate hosts (e.g. E. coli), thus yielding a library of DNA sequences that can be amplified by culture of cells containing the DNA inserts. Once the library has been created, the DNA inserts can then be randomly sequenced or screened using Polymerase Chain Reaction (PCR) primers or through observing the functional genes expression using the right assay procedure for selected functions such as utilization of a particular substrate (Baquiran, 2010).

Approaches in metagenomics
There are often two main approaches to metagenomics, namely targeted metagenomics and shotgun metagenomics. Targeted metagenomics or microbiomics as an approach probe a single gene of interest in order to identify the complete complementary sequences of such gene present within an environment. It is the approach that is adopted frequently in order to both investigate how diverse a particular gene is phylogenetically and its relative abundance within a sample. This targeted metagenomic approach is also frequently employed to determine within a sample how diverse the small subunit rRNA sequence (16S/18S rRNA) are. Small subunit rRNA sequencing are routinely utilize by microbial ecologists to know and comprehend the different taxas and their diversity within an environment (Vieites et al., 2008;Techtmann and Hazen, 2016). In order to carryout targeted metagenomics, environmental DNA is extracted with the gene of interest being amplified (using PCR) by means of primers designed to amplify the greatest diversity of sequences for that gene of interest. Next generation sequencing (NGS) (a highthroughput approach to DNA sequencing) is then used when the amplified genes are sequenced. The result of next-generation sequencing is thousands of small subunit rRNA reads for each sample and can probe hundreds of samples at the same time (Xiong et al., 2010;Behjati and Tarpey, 2013). The universality of the primer chosen for PCR analysis is the limitation of targeted metagenomics which captures the diversity of a particular gene of interest. Moreover, various bioinformatics analysis has demonstrated the potential of the overall diversity to skew. Targeted metagenomics has its strength in its provision of comprehensive catalog of microbial taxa found within a sample, this allows for thorough comparison of shifts in the diversity of microbes before and after a perturbation. In the case shotgun metagenomics, the whole genome of an environment is probed through genomic sequencing. This approach extract DNA from the environment, this DNA are then fragmented in order to prepare sequencing libraries. These libraries are then sequenced to determine the total genome of the sample. Shotgun metagenomics is a powerful tool where the functional potential of a community of microorganisms can be identified (Techtmann and Hazen 2016).
In the study of shotgun metagenomics, five basic steps are involved aside the initial study design; this steps include "the collection, processing and sequencing of the samples; preprocessing of the sequencing reads; sequence analysis to profile taxonomic, functional and genomic features of the microbiome; statistical and biological post-processing analysis, and finally validation" (Quince et al., 2017).
One of the limitations of metagenomics is the depth of sequencing. Gaining access to the complete inventory of genes within an environmental sample requires extremely deep sequencing. Good coverage of the total genome content of every organism in the community is required for a comprehensive analysis of the functional potential of a community. Dominant microbes within a community or an environment are the most heavily sampled in shotgun metagenomics while the low abundant members of that environment are sparsely captured. Furthermore, analysing metagenomic sequenced data can be very complex and tasking as it involves accurately interpreting diverse gene sequences, many of which have no homologs in the current sequence databases (Delmont et al., 2012). Linking a functional gene with a taxonomic classification using phylogenetic anchor is the objective of several studies. This can often be difficult with metagenomics sequencing unless sufficient sequencing depth is achieved and the reads can be accurately assembled into sufficiently long contigs. Many computational approaches have sought to assemble metagenomic sequences into complete genomes in order to have a better understanding of the functional potential of particular species within a community. Summary of important steps have been provided by recent reviews highlighting the several drawback in these techniques. (Techtmann and Hazen, 2016).

Metagenomics in hydrocarbon resource environment
Metagenomics has led to several advances in microbiology and biotechnology by examining unculturable microbes. Several success stories have been recorded in the use of metagenomics. Genes encoding for antibiotics have been isolated, discovery of unknown genes, novel biocatalysts have been expressed (Wexler et al., 2005), new gene functions have also been discovered and even sequence near complete genomes of so far unculturable microbes (García Martín et al., 2006). Metagenomic approaches also have led to the discovery of petroleum degrading genes. Suenaga et al. (2007) were able to find novel genes with the capacity to degrade aromatic compounds from coke plant wastewater and Ono et al. (2007) was able to characterize the catabolic genes of naphthalene isolated from soil contaminated with oil (hydrocarbon).
The role of microorganisms in petroleum environments has been an area of interest for many biotechnology based applications. Petroleum is a complex mixture of heavy to light hydrocarbons and many other organic compounds, including organometallo constituents.
Since petroleum hydrocarbons have been identified as substrates which supports the growth of microbes, they both serve as a target and product for microbial metabolism (Van Hamme et al., 2003). Oils biodegraded represent a significant fraction of the petroleum in conventional oil reserves. Exploitation of petroleum-degrading microbes for environmental cleanup has spur several interest and this has become central to petroleum microbiology (Van Hamme et al., 2003).
Microorganisms isolated from hydrocarbon polluted environment have been extensively tested for bioremediation applications as they have the capability to degrade petroleum hydrocarbons in both the laboratory and environment (Ollivier and Magot, 2005). Initial reports of underground reservoirs using molecular approaches suggest that the majority of microorganisms inhabiting these environments are new species that represent a rich pool of novel genetic diversity with potential importance for industrial and petroleum microbiology (Van Hamme et al., 2003).
To date, studies of petroleum reservoirs and petroleum contaminated sites using culture based techniques have revealed physiologically diverse assemblages of thermophilic and hyperthermophilic anaerobic microorganisms along with bacteria that are able to live on complex petroleum hydrocarbons.
Previously isolated anaerobic microbes include sulphate reducing bacteria, sulfidogens, fermentative bacteria, iron and manganese, acetogens and methanogens. Classes of hydrocarbons that bacteria isolated from petroleum sites have shown to metabolize include n-alkanes (Kato et al., 2001), BTEX compounds (Lu et al., 2006 and polycyclic aromatic hydrocarbons (PAHs) (Alquati et al., 2005;Coral and Karagoz 2005). Although these culture based approaches are useful for understanding the physiological properties of the isolated organisms, they do not provide detail information on the composition of microbial communities, because only less than 1% of bacteria and archaea can be cultured in the laboratory. Therefore, it is difficult to assess the importance of these isolated organisms within their natural environment.
Due to the disparity in diversity between isolated bacteria and in situ microbial communities, cultureindependent methods have been established in order to circumvent the culturing bias. Diversity studies on petroleum impacted sites have focused on PCR based methods that examine the extracted DNA of the microbial communities. Community fingerprint analyses using PCR coupled with denaturing gradient gel electrophoresis (DGGE) have identified predominant bacteria in petroleum samples. Many phylogenetic groups have been represented from these studies, which include Pseudomonas sp., Ochrobactrum sp., Staphylococcus sp., Sphingomonas sp. and Burkholderia sp. (Yoshida et al., 2005). In addition, 16S rDNA clone libraries have been constructed to identify individual members of the microbial community present in petroleum based sites, which include Arcobacter nitrofigili, Clostridium ramosum, Desulfobacula toluolicaa, Pandoraea pnomenusa and Pseudomonas stutzeri (Grabowski et al., 2005;Li et al., 2006). These methods have also revealed that diverse communities consisting of many novel and previously uncultured microorganisms exist in both petroleum reservoirs and petroleum contaminated sites. However, because only the 16S rRNA gene sequences are being examined, the actual functions of the community members within their environment are unknown. The inability to link functionality to species represents a major limitation in these studies.
Alternative culture independent methods include phospholipid fatty acid analysis, fluorescence in situ hybridization and a variety of PCR-based approaches like denaturing gradient gel electrophoresis (DGGE) and 16S rRNA gene microbial surveys.

Application of metagenomics in hydrocarbon resource management
In the field of microbial world, metagenomics have proven as rapidly growing weapon and has changed the way, which microbiologist faced many problems.
Amongst the methods developed to get access to the genetics and physiology of uncultured microorganisms, metagenomics, which is the analysis of a population genome of microbes, has emerged as a powerful centerpiece.

Application of metagenomics in bioremediation
Metagenomics has great potential for both fundamental and industrial applications that range from the understanding of microbial adaptation and evolution to the discovery of new enzymes for their direct use (Deutschbauer et al., 2006;Ferrer et al., 2007). Bioremediation is defined as "the use of living organisms to reduce or eliminate environmental hazards resulting from accumulations of toxic chemicals or other hazardous wastes" (Rahman, 2011) Sites polluted with toxic chemicals and industrial wastes have transformed environmental biotechnology because these habitats include niches for microorganisms that have the necessary enzymes to use these compounds as their carbon and energy source. The genetic diversity found in these environments includes genes encoding degradative enzymes and pathways for recalcitrant chemicals, which are potentially useful for the remediation of environmental pollution and as sources of novel catalytic activities with applications for green chemistry and biotechnology (Galvão et al., 2005). Several studies have examined the biodegradation, bioremediation and biotransformation of petroleum hydrocarbon. However, surveys of the enzymes that are available in nature have only begun.
The discovery of novel enzymes and pathways has opened the door in the search for new bacteria, genes, and catabolic pathways that can be used for biotechnology.
Selecting bacterial communities for the degradation of petroleum substances occurs rapidly after even short-term exposures of soil to petroleum hydrocarbons following oil spills. Over time spans encompassing millennia, bacteria that can tolerate this environment would be expected to undergo genetic adaptations that may lead to the evolution of new ecotypes and species and enzymes for growth on petroleum hydrocarbons. During adaptation of communities, genes for petroleum hydrocarbon-degrading enzymes that are carried on plasmids or transposons may be exchanged between species. In turn, new catabolic pathways eventually may be assembled and modified for efficient regulation (Rabus et al., 2005).Therefore, some of these genes and pathways may be used to construct genetically modified organisms (GMOs). The combination of enzymes and pathways from different organisms in one recipient strain is a useful strategy for designing bacteria that has enhanced degradation abilities. Several GMOs already have been successfully constructed to increase degradation ability and its effectiveness for bioremediation (Paul et al., 2005). Natural asphalts that originated 40,000 years ago likely contain efficient enzymes and catabolic pathways, which can possibly improve these designer strains for the biodegradation of petroleum hydrocarbons and other organic compounds. The initial steps to achieving this future goal would be to first find and characterize the genes that encode these biocatalysts.
One of the most practical applications of microorganisms and their hydrocarbon degrading enzyme is in bioremediation; this is due to the ability of these microbes to utilize petroleum as a carbon and energy source. During bioremediation microbes degrade or transform hazardous organic compound to a relatively nontoxic state. (Dua et al., 2002;Paul et al., 2005). Bioremediation is often considered a cost-effective and environmentally friendly alternative approach in comparison to the conventional methods of remediation, such as excavation and incineration, which are very costly and can result in the generation of toxic air emissions (Kuiper et al., 2004;Kure et al., 2018). In addition, bioremediation techniques can be applied in situ without the removal of the contaminated soil, thus not disturbing the environment. The use of bioremediation for environmental decontamination has been growing, and has attracted public interest. Currently, microorganisms with the ability to degrade various pollutants (e.g. polycyclic aromatic hydrocarbons, nitroaromatics, polychlorinated biphenyls and oil components) have been isolated in hopes of exploiting their metabolic potential for the remediation of contaminated sites (Dua et al., 2002).
Contamination caused by petroleum hydrocarbons has gathered much attention due to concerns stemming from large scale releases into the environment. Polycyclic aromatic hydrocarbons (PAHs) and BTEX (benzene, toluene, ethylbenzene and xylene) compounds are environmental pollutants commonly found in petroleum contaminated sites. PAHs have carcinogenic and mutagenic properties and it is the heavier fractions that pose the greatest long term recalcitrance (Bisht et al., 2015). The lighter compounds (i.e. BTEX) are able to leach into groundwater basins and contaminate freshwater supplies. Each year, an average of approximately 1,680,000 gallons of crude oil are spilled on land due to pipeline failures and more than 200,000 underground storage tanks in the US have leaked gasoline and other fuels into the soil, sediments and groundwater aquifers, making petroleum and its derived fuels the most ubiquitous organic pollutants around the world (Salanitro 2001). Due to the amount of petroleum hydrocarbons contaminating the environment, improved bioremediation technologies are necessary.

Metagenomic and biodegradation
Microbial community structures have always been altered and modified as a result of environmental contamination; its effect/impact can be examined by the use of metagenomics. Several loads of different kinds of waste such as petroleum spills and the incomplete combustion of fossil fuels, produced by industries have caused an accumulation of petroleum hydrocarbons in the environment. The generation of these anthropogenic compounds, through oil-related production, introduces into the environment each year large amounts of aromatic hydrocarbons, resulting in the contamination of ecosystems (Jacques et al., 2007). As microorganisms are directly involved in biogeochemical cycles as key drivers of the degradation of many carbon sources, including petroleum hydrocarbons, can break down aromatic rings, such as those of benzene, toluene, and xylene, and mineralize their carbon skeleton. Therefore, metagenomics is a tool that eliminates cultivation steps, as it consists of direct extraction of environmental DNA and its cloning in an appropriate vector. Sierra-García et al. (2014) reported the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. Using a function-driven metagenomic approach they reported novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Recently, researchers identifies genes and their metabolic pathways related to degradation of phenol and other aromatic compounds in sludge samples from a petroleum refinery wastewater treatment system, using a metagenomic approach to capture a broader range of the extant functional diversity. Marcos and his colleagues in 2009 reported bacterial populations in cold marine ecosystems which can degrade polycyclic aromatic hydrocarbon (PAH) by identifying functional targets. They identified 14 distinct groups of genes, most of them showing significant relatedness with dioxygenases from Gram-positive bacteria of the genera Rhodococcus, Mycobacterium, Nocardioides, Terrabacter and Bacillus, these results indicate the presence of a high diversity of hitherto unidentified dioxygenases genes in this cold polluted environment. The information obtained could be used as starting point for the design of quantitative molecular tools to analyse the abundance and dynamics of these aromatic hydrocarbon-degrading bacterial populations in the marine environment.

Metagenomics and biosurfactants
Several microorganisms (bacteria, fungi and yeast) produce biosurfactants which are surface active molecules. Less toxicity, environmental friendliness and biodegradability characteristics of these biosurfactants have gained considerable interest and attention from researchers over chemical surfactants. Environmental concerns, advance in biotechnology and the emergence of more stringent laws have led to biosurfactants being a potential alternative to the chemical surfactants available in the market. Biosurfactants are potential replacements for synthetic surfactants in several industrial processes, such as lubrication, wetting, softening, fixing dyes, making emulsions, stabilizing dispersions, foaming, preventing foaming, as well as in food, biomedical and pharmaceutical industry, and bioremediation of organic or inorganic contaminated sites (Henkel et al., 2012).
Most of the biosurfactant described are of microbial origin, isolated through traditional cultural methods. However, knowing that the vast majority of bacteria are yet to be cultured, metagenomics provides the potential to explore novel biosurfactants from bacteria which are recalcitrant to culturing and from exotic and unexplored environments (Williams and Trindade, 2017). Metagenomics has been used to make DNA libraries of the petroleum-contaminated samples (soil, water, etc.) followed by the screening of biosurfactant producing clones. Morikawa et al. (1992) reported two bacteria (A-1 and B-1) which exhibited large emulsified halos around their colonies on oil-L-agar plates were isolated. These bacteria produced the same biosurfactant, surfactin. There are number of screening method for isolation of biosurfactants which could be employed from metagenomics libraries for example function-based approaches which have been developed include substrate-induced gene expression (SIGEX) for screening of metagenomic libraries and High through put (HTP) screening (Satpute et al., 2010). Kennedy et al. (2011) described some of the functional screen for the isolation of biosurfactants which together with approaches which can be employed to help overcome some of the typical problems encountered with functional metagenomic-based screens.

Conclusion
Metagenomics has incredible potential for both fundamental and industrial applications that range from the understanding of microbial adaptation and evolution to the discovery of new enzymes for their direct use. Amongst the methods developed to get access to the genetics and physiology of uncultured microorganisms, metagenomics, which is the analysis of the population genome of microbes, has emerged as a great focal point.