What is the difference between a cdna microarray and an oligonucleotide array




















In Saccharomyces yeasts, the Ty families of retrotransposons show substantial variation across strain backgrounds and species 74 , 75 , and novel transposon insertions can cause both adaptive 76 , 77 and detrimental 78 mutations. Retrotransposon proximity to genes can also modify gene expression and regulation Moreover, transposons are potentially a source of CNV as they are correlated with the breakpoints of genomic rearrangements from yeast 41 , 80 to humans Although these repetitive sequences are biologically important, they are poorly covered by all but the highest quality sequencing approaches.

For example, many of the several hundred gaps remaining in the Saccharomyces bayanus sequence correlate with retrotransposon sequences in S. Cherry, personal communication. Because of the technical issues that are involved with sequencing and assembling these repetitive elements they are often excluded from sequencing projects. Thus, methods for mapping these elements have been developed that rely on identifying the unique sequences abutting common repetitive sequences.

Mapping insertion sites. Global mapping of insertion sites is generally performed using a means of isolating the insertion element and its immediately neighbouring DNA.

The DNA is then hybridized to a whole-genome array to identify its genomic location Fig. Sequence-specific isolation of insertion sites has been successfully demonstrated using oligonucleotide capture probes 75 , or by annealing linkers to fragmented DNA and using PCR 81 by adapting a method that was developed to sequence the termini of genomic clones Thus far, analytical approaches to these data have simply entailed the identification of contiguous probes above some threshold value, although these methods are certain to evolve as the approaches mature.

One way of mapping insertion sequence variation is to isolate the insertion element and its immediately neighbouring DNA. Specific regions of the genome are isolated using either a capture probe method as illustrated in the diagram or a PCR-based method. This approach is suited to mapping the location in the genome of mobile genetic elements, which are notoriously difficult to characterize using whole-genome sequencing approaches.

As shown in the figure, DNA is fragmented in the first step a. The site of transition corresponds to the genomic location of the insertion sequence. Transposons are also used extensively as large-scale mutagenesis tools, in model organisms ranging from bacteria 83 and yeast 84 , 85 to zebrafish 86 and mice Identifying the locations of experimentally mobilized transposons poses a similar challenge to identifying endogenous transposons.

Isolation of the insertion sites of these artificial transposons in E. This method is sufficiently quantitative that the relative abundance of mutants in a complex mixture can be followed over several rounds of genetic selection 88 , allowing the simultaneous identification of enriched and depleted mutants. Most genome sequencing methods are not well suited to the problems posed by repetitive sequences. The shorter read lengths of next-generation sequencing methods exacerbate the problems of standard sequencing approaches with respect to coverage and assembly.

Combining an enrichment method with sequencing rather than using microarrays presents one possibility. Conceptually, this approach is analogous to chromatin immunoprecipitation ChIP , but rather than enriching loci using an antibody that is targeted to a DNA-bound protein of interest, an oligonucleotide would be used to enrich specific loci in the genome. As global ChIP has already been adapted to new sequencing platforms 90 , 91 , transposon mapping could in principle also be done this way.

Experimental issues: the advantages of an internal control. When dealing with millions of data points it is important to ensure that data quality and processing are managed effectively. To obtain robust data, experimental and analytical considerations need to be made at all stages of the protocol. A fundamental difference between experimental platforms is whether there is an internal control for each probe, which is achieved by co-hybridizing differentially labelled sample and reference DNA a two-colour microarray , or whether only a single sample is analysed a one-colour microarray Box 2.

Recent comparative studies have indicated that two-colour and one-colour microarray experiments produce concordant results for gene expression analysis However, in contrast with gene expression experiments in which data from multiple probes can be used to determine a gene expression value comparative genomic analyses are more sensitive to spurious probe values.

There are two arguments for the use of a co-hybridized reference in order to maximize the sensitivity of individual probe data. The first reason is the need for an internal control for probe quality. Microarray probes can be created in several different ways and the manufacturing of microarrays can result in significant variation in probe quality and quantity. This is particularly acute for the 'homemade' variety of microarrays, such as the BAC arrays used for aCGH, in which probe quality can vary greatly between microarrays.

However, even leading microarray manufacturers frequently provide microarrays with variable probe quality The presence of an internal reference provides a means of controlling the variation that is due to probe quality because the measurement of interest is the relative binding efficiency between the reference often a standard reference used over and over again and sample. Therefore, this experimental design enables a ratiometric approach to data analysis rather than a reliance on absolute measurements between microarray experiments.

The second experimental concern addressed by a ratiometric approach is geographic variation across the microarray due to the conditions in which the hybridization occurred. Most microarrays are mixed by placing the array in a rotating hybridization oven or some device designed to provide agitation of the reactants.

Variation in mixing and the concomitant differential time of exposure to reagents across the microarray can result in variable hybridization efficiency across the array. This too is readily controlled by the presence of an internal reference. An additional requirement for minimizing geographic artefacts in microarray data is the randomization of probe locations on the microarray with respect to genome location.

This simple adjustment in array design reduces the chance that experimental artefacts are conflated with biological significance. Several normalization methods that attempt to address inter- and intra-array variation for one-colour microarrays have been developed The use of a ratiometric approach greatly simplifies the task of extracting signal from noise because each measurement at every probe is internally controlled Box 2.

Nonetheless, two-colour experiments also have a potential limitation: they require the use of two different dyes, which can introduce a dye-specific bias. This effect can be mitigated either by repeating experiments with the dyes interchanged, by performing control experiments that explicitly study the effect of dye bias 95 or by addressing the effect using statistical approaches Data analysis considerations.

A first step in data processing is normalizing data for comparison with or between microarrays. The methods that are used in both one-colour and two-colour microarray experiments usually assume that there is an equal quantity of DNA in all samples. However, this assumption is incorrect in the case of aneuploid samples and, to a lesser extent, samples with differing CNV content. Simple linear corrections have been applied in the case of yeast that carry extra chromosomes 97 , but most methods for normalization do not account for this possibility thereby underestimating values for regions that differ between the two genomes.

Similarly, normalization methods for the sequence-level comparison of genomes using short-oligonucleotide microarrays require approaches that do not assume that total hybridization across the array is always equal between samples.

A key to validating any microarray approach is an assessment of false positive and false negative rates. This is challenging insofar as it requires knowledge of variation in another genome from the same species that can be used as a test case.

For a small number of organisms, multiple sequenced genomes are available, which facilitates testing of a method against a gold standard. Where this is not possible methods must rely on comparisons between datasets. The value of visualizing and exploring genome-scale data cannot be understated. A number of software options exist for visualizing data Table 1. Modules in the Bioconductor package can also be used for visualizing aCGH data. In our experience the freeware application, Integrated Genome Browser, provides a great deal of flexibility in terms of the data types that can be imported and viewed and is versatile for viewing multiple diverse datasets simultaneously.

Inherent limitations. Although microarray-based methods are a powerful and simple approach to characterizing genomic variation, there are a number of limitations. The greatest limitation in comparison with de novo sequencing methods is the fact that only known sequence is interrogated.

Most microarrays have been made using a reference sequence that was obtained using whole-genome Sanger sequencing of one or a few members of a species. Therefore, if additional individuals that contain genomic loci that were not detected in the initial sequencing are analysed using these arrays then these genomic regions will be completely missed in the analysis. This is likely to represent a small proportion of any genome, but in yeast there are a number of cases of genes that are found in some strains but that are not present in the reference sequence It seems likely that this will be a general case for many species.

Once such genes have been found in other individuals it makes sense to incorporate them into subsequent microarray designs. An additional limitation for microarrays is the analysis of highly repetitive regions. This includes regions or features of the genome that are present in multiple copies, such as transposons and telomeres as well as low-complexity DNA that contains repetitive motifs.

Low-complexity DNA poses a particular problem for detecting sequence variation because hybridization efficiency seems to be much more variable in these regions and is a common source of false positives. As more attention moves from determining genomic sequences de novo to comparing large numbers of individual variants of the same sequences, the need to simplify and reduce the cost and effort will increase. In most of the main areas of investigation for example, disease and evolution the changes will involve a miniscule fraction of the total genomic sequence for the human genome sequence, as little as 1 base in 3 billion ; here, the prospect of truly efficient DNA-microarray-based surveys of variant sequences, followed by local sequencing, promises reductions in cost that make it entirely possible to study thousands of samples in a single laboratory.

We expect that even though sequencing will continue to be made cheaper and more effective, the evolution of DNA-microarray technology will keep pace or better, resulting in a situation in which the 'detect, map and locally sequence' strategy, which is essentially a combination of approaches, will continue to outperform complete resequencing for some time to come.

An explicit example of an innovative combination of a microarray-based approach coupled with sequencing was the capture of the SARS coronavirus using a microarray of 70mer oligonucleotides followed by sequence confirmation This approach has now been extended to selective enrichment of the entire coding fraction of the human genome, enabling targeted resequencing using high-throughput methods , , Clearly, these applications suggest the feasibility of a generic approach in which regions of interest that are identified using microarrays can be directly isolated and further investigated.

The use of microarrays to resequence the small genomes of pathogens has been one of the most productive uses of array-based approaches to sequence-level comparisons. With increasing flexibility in array manufacturing and improved methods for detecting variation this approach should be readily applicable to various organisms.

With a growing interest in the human microbiome , and its role in normal and disease states, studying the role of genomic variation in these microorganisms will become increasingly important. Surprisingly, the ability to localize sequence changes cheaply has generated considerable demand in traditional experimental settings, such as the identification of suppressor mutations and as an adjunct to positional cloning.

In practice, a single microarray experiment can save many years of laborious work. Moreover, the ability to assess global variation in an unbiased manner allows questions regarding the genetic consequences of experimental techniques, such as the mutagenic cost on the host genome of genetic engineering manipulations, and of natural phenomena, such as ageing, to be effectively addressed on a global scale.

The application of these methods will accelerate current areas of research and allow new questions to be asked in all organisms, including humans Box 3. For example, studying the role of intra-individual somatic cell genomic variation might prove insightful for investigating the genomic basis of somatic tissue disease.

This is clearly relevant to known genetic diseases such as cancer, in which the current state of the art is Sanger sequencing , However, it is also likely to be of great importance to the study of other diseases of somatic origin, in which the identification of causative mutations is refractory to typical genetic approaches. Similarly, identifying global sites of viral integration, for example, with human immunodeficiency virus , and identifying natural variation in insertion sequences in species is a comparatively under-explored area of biology in which microarray-based approaches have much to offer.

Although the advent of generic, 'gold standard' genomic sequences has indeed produced a radical change in biological research, only a fraction of the potential biological insight of genomic sequences is available from this source: the remainder will require genomic comparisons of many types. All forms of genomic diversity — structural, sequence and insertional — can be detected using microarrays. In contrast with the cost, labour and time that is involved in whole-genome sequencing, microarray-based approaches are fast, flexible and inexpensive.

It seems likely that the co-evolution of DNA microarray and direct-sequencing strategies will make the power of genomic comparison accessible to any and all researchers who might benefit from it. As with all intermolecular reactions, the rate of formation of the DNA duplex that is formed between the probe and the sample is a function of both the concentration of reactants and temperature.

To use hybridization to compare genomes at the sequence level it is necessary to maximize the difference between the T m of the perfectly matched DNA and the T m of the mismatched DNA. This difference is highly dependent on the length of the oligonucleotide and in practice is likely to only be within the range of detection for oligonucleotides that are shorter than 50 bp. Therefore, short probes are required to interrogate sequence differences between genomes.

Longer probes — such as those provided by BAC clones, cDNA clones or PCR products — provide greater coverage of the genome and allow detection of structural variation, even in the presence of a small number of sequence differences see table; IV, insertional variation; SV, structural variation.

In a two-colour experiment panel a in the figure DNA from individuals of the same species or different tissue from a single individual for example, normal and diseased cells is extracted and differentially labelled with compatible fluorophores for example, Cy3 and Cy5. At most probes, equal amounts of the two samples will hybridize yellow features on the array , reflecting the fact that most loci in the two genomes are present in equal amounts for example region 3. Regions that are deleted in the sample genome region 1 of sample A will result in probes with increased relative Cy3 signal green features.

Alternatively, amplified regions in the sample region 2 of sample A will result in features with an increased relative Cy5 signal red features. Over the entire microarray, the signal ratios at each feature follow a Gaussian distribution, and candidate copy number variations are identified on the basis of deviation of a particular probe ratio, using statistical cut-offs.

Although the diagram illustrates the protocol for array comparative genome hybridization, all array procedures, including SNP discovery and insertion site mapping, are carried out in this way. One-colour experiments panel b in the figure are performed in a similar manner, except that the DNA is labelled with a single colour and hybridized to a microarray without a reference sample.

The difference between two-colour and one-colour experiments is that in the former case two samples are compared within an experiment whereas in the latter case two separate experiments are required to compare the samples. For Affymetrix-manufactured microarrays, the method entails labelling DNA with biotin, then adding streptavidin conjugated to phycoerythrin after hybridization represented by yellow circles.

Rather than a ratio, an absolute value of hybridization is determined; following normalization, this value is compared with other experiments to detect genomic variation. A single two-colour hybridization gives less variation at each probe than two independent one-colour hybridizations because the detailed conditions at every probe, such as salt concentration and temperature, are identical in the two-colour experiment but are not necessarily identical in the two independent one-colour experiments.

Part a of the figure is modified, with permission, from Nature Reviews Genetics Ref. Microarrays are being applied to a wide range of questions regarding genomic diversity in humans. Whereas microarrays have so far been used predominantly for SNP genotyping 60 , we believe that microarrays will continue to provide a powerful means of discovering new genomic variation and assaying its frequency in the human genome.

Below is a list of some of the applications of microarray-based approaches to studying human genome diversity that have already commenced — and some questions that should be tractable using the approaches discussed in this Review. Studying the nature and extent of structural variation in the human genome. The recent discovery of widespread copy number variation CNV in the human genome 44 , 57 has ignited a new interest in this class of genomic variation, which is amenable to discovery using array comparative genome hybridization.

The relationship between structural variation and human disease. It has long been known that human disease can be caused by gene amplification or deletion 37 , but it is only recently that genome-scale approaches have revealed the high frequency of de novo CNV and its potential association with autism The role of gene amplification in human evolution.

Recently, it has been discovered that CNV at the AMY1 locus, encoding the salivary enzyme, amylase, has been under selective pressure through human history It is likely that selection for or against CNV of particular loci will be a general theme in the human genome as it is in other organisms.

Genomic changes associated with cancers. Typically, genome-wide studies of somatic point mutations that are associated with cancers in humans have used Sanger sequencing approaches , Microarray-based approaches have the potential to address the various genomics events, including base pair changes, structural variation and possibly insertion variation, that are associated with cancers far more rapidly and efficiently.

The role of insertion sequences for example, long interspersed nuclear elements, or LINEs in generating cellular diversity and disease. A study has reported the potential role of LINE1 elements in generating neuronal diversity This presents the tantalizing possibility that somatic mosaicism might be facilitated by mobile elements in the genome — a question that is readily amenable to microarray-based approaches.

The extent of insertional variation in humans. Mobile elements constitute most of the human genome. However, little is known about their variation throughout the human population. It is reasonable to expect that this variation is at least comparable to that observed for CNV.

Sanger, F. DNA sequencing with chain-terminating inhibitors. Natl Acad. USA 74 , — Nature , — Lander, E. Initial sequencing and analysis of the human genome.

Venter, J. The sequence of the human genome. Science , — Massive parallelism, randomness and genomic advances. Nature Genet. DeRisi, J. Exploring the metabolic and genetic control of gene expression on a genomic scale.

Velculescu, V. Characterization of the yeast transcriptome. Cell 88 , — Winzeler, E. Functional characterization of the S. Kamath, R. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi.

Ghaemmaghami, S. Global analysis of protein expression in yeast. Reboul, J. Article Google Scholar. Shendure, J. Accurate multiplex polony sequencing of an evolved bacterial genome. Margulies, M. Genome sequencing in microfabricated high-density picolitre reactors. Bentley, D. Whole-genome re-sequencing. Marmur, J. Thermal renaturation of deoxyribonucleic acids.

Davis, R. Electron-microscopic visualization of deletion mutations. USA 60 , — This paper is one of the first examples of whole-genome comparison using hybridization.

The authors denatured bacteriophage DNA and visualized the renatured DNA using electron microscopy to identify genome deletions. Southern, E. Detection of specific sequences among DNA fragments separated by gel electrophoresis. This reference is the original paper describing the Southern blot method of analysis. Kafatos, F. Determination of nucleic acid sequence homologies and relative concentrations by a dot hybridization procedure. Nucleic Acids Res. Wallace, R. Conner, B.

USA 80 , — Maskos, U. Oligonucleotide hybridizations on glass supports: a novel linker for oligonucleotide synthesis and hybridization properties of oligonucleotides synthesised in situ. Pease, A. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. USA 91 , — Hughes, T. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer.

Nature Biotechnol. Shalon, D. Genome Res. Wong, C. Tracking the evolution of the SARS coronavirus using high-throughput, high-density resequencing arrays. Chee, M. Accessing genetic information with high-density DNA arrays.

This paper describes a large advance in microarray manufacture and analysis: over , probes were synthesized on an array, which was used to probe sequence diversity in the human mitochondrial genome. Maitra, A. The human MitoChip: a high-throughput sequencing microarray for mitochondrial mutation detection. Ishkanian, A. A tiling resolution DNA microarray with complete coverage of the human genome. This paper describes the first complete coverage of the human genome using a BAC microarray.

Fiegler, H. Accurate and reliable high-throughput detection of copy number variation in the human genome. David, L. A high-resolution map of transcription in the yeast genome. USA , — The following additional data are included with the online version of this article: Figures that show scatter plots of cell-cell Pearson correlation coefficients relating the cDNA array and oligo array data sets Additional data file 1 and scatter plots of cell standard deviations across the set of 2, genes for the cDNA array and oligo array data sets Additional data file 2.

An Excel sheet for the original Affymetrix oligo array data on 6, human and control transcripts Additional data file 3. It that includes the index, gene accession number except for controls, which have Affymetrix's designations and original oligo array gene expression data for the 60 cell lines average difference intensity ; for calculations, these original data were floored at 30 and log 2 -transformed.

An Excel sheet for the ratio data on 9, cDNA-clone database ratios obtained as described in text Additional data file 4. These clusters were represented by 2, oligo transcripts and 3, cDNA clones see Table 1. For this summary table, the pair with maximum correlation was used if there was more than one cDNA-oligo pair in each UniGene classification. An Excel file Additional data file 6 for the consensus data set based on both cDNA- and oligo-arrays.

Values represent log means of the cDNA ratio data and oligo data for the 2, matched pairs summarized in cDNA-oligo It should be noted that the two data sets can be concatenated in a number of other ways, for example, after being normalized or transformed into ranks, and we provide the former.

Additional data file 7 Additional data file 7 provides descriptive information on the oligo-array transcripts listed in Additional data file 8 Additional data file 8. Included are index, gene accession number, UniGene cluster, description of gene function when available , HUGO name of the gene, chromosome location of gene, and LocusLink; a corresponding Excel file Additional data file 8 contains the Oligo database and includes the index, gene accession number and oligo array gene expression data for the 60 cell lines average difference intensity ; data were floored at 30 and log 2 -transformed.

Additional data file 9 Additional data file 9 provides descriptive information on the cDNA-array transcripts listed in Additional data file 10 Additional data file The data were log 2 transformed. Drug Dev Res. J Natl Cancer Inst. J Chem Inf Comput Sci. J Med Chem. Cancer Res.

J Cancer Res Clin Oncol. J Clin Invest. PubMed Google Scholar. Nat Med. Nat Genet. Book Google Scholar. J Biol Chem. J Invest Dermatol.

Genome Res. Genome Biol. Download references. We thank T. Golub, E. Lander, J. Staunton, H. Coller, P. Tamayo and colleagues for their collaboration in generating the oligo-array dataset. We also thank D. Ross, M. Eisen, P. Brown, D. Botstein and colleagues for collaboration on the cDNA-array experiments. We are grateful to the many members of NCI Developmental Therapeutics Program whose work over the years has contributed to the pharmacological and target datasets for the NCI, and to their analysis.

We particularly thank E. Sausville, K. Paull, M. Boyd, M. Grever, D. Scudiero, A. Monks, J. Johnson, R. Schultz, V. Narayanan, R. Shoemaker, M. Alley, D. Zaharevitz, D. Covell, L. Rubinstein, R. Camalier and S. In our own group, we thank Ajay and L. Tanabe for computer programs that proved useful in this study.

You can also search for this author in PubMed Google Scholar. Reprints and Permissions. Lee, J. Comparing cDNA and oligonucleotide array data: concordance of gene expression across platforms for the NCI cancer cells.

Genome Biol 4, R82 Download citation. Received : 12 February Revised : 14 July Accepted : 27 October Published : 25 November Anyone you share the following link with will be able to read this content:. Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative.

Skip to main content. Search all BMC articles Search. A researcher gives you a list of common housekeeping genes to help you with normalization. Housekeeping genes are a set of genes that are ubiquitously expressed in a relatively stable manner. As opposed to normalizing to the average gene of the entire array, this form of normalization uses the average of the housekeeping genes.

For your experiments, you are asked to compare the differences in gene intensities due to global versus housekeeping normalization. As a simple test, you notice that when you use global normalization, the housekeeping genes are significant higher but not saturated in one array versus another.

They ask you why SAM failed to find any differentially expressed genes. What should you tell them? Naef, et al. Bioinformatics 19 2 :



0コメント

  • 1000 / 1000