We would like to congratulate Dr. Mark Bayfield from York University, who won a 500$ credit for services offered by Génome Québec. We would like to thank everyone that filled out the annual survey.
FlexArray is a desktop application, not a Web application. It needs to be installed on the user's machine.
Currently, FlexArray only runs on Windows, but a Mac / Linux version is feasible in the future if there is sufficient demand for it.
Yes. In order to perform calculations, FlexArray uses R, the open-source language for statistical computing (see the project web site). Calculating routines (algorithms) come in the form of R libraries (a.k.a. packages) that need to be downloaded and installed. Normally this is done only once - the first time you run FlexArray. If you experience difficulties with this procedure, please run FlexArray as an administrator.
In versions 1.5 and 1.6 of FlexArray we have spent a considerable amount of time making the installation more bullet-proof on recent versions of Windows. Here are things to consider or to verify when you are still having problems.
Different versions of the .NET framework are designed to co-exist on the same machine. By installing 2.0 you won't be affecting the other installation(s) of the Framework, or any applications that depend on it/them.
The general analysis flow is the following:
A set of video tutorials is available.
There are also some useful documents included in your installation - most of them can be accessed via the Help menu, and some other ones can be found in the "doc" subfolder of the installation directory. In particular, all new and changed features in versions 1.4, 1.5 and 1.6 are described in detail in a document called "New features.pdf" you will find there. You can also download this file from our web site: New-features-1.6.pdf.
First, you need to make sure that your data is normalized, and saved in a text file. Then, you need to use the "Import custom normalized data" option (Data menu). Once the data is imported into a FlexArray project, you can view it in the data table and plots, and run statistical tests on them, just like you would with a fully supported technology.
It is possible to do this to some extent although it depends on what exactly you wish to do. This is normally done via the option of importing custom normalized data. So here's the outline of the procedure:
While this will make comparisons easier, it won't give you direct graphic comparisons of both array types / technologies, feature by feature. This is because FlexArray does not know how to map feature ids of one array type to the other. Currently, no mechanism exists in FlexArray to facilitate this. Your best option is to add an intermediate step performed in Excel or R or other software that will match feature ids between array types and resolve any issues due to duplication etc.
If you have a valid CDF file for your chip type, make sure that you specify the folder name in which it is located when you import your Affymetrix data.
But even if you don't have a CDF file it isn't really a big problem: the CDF file specified during the import of raw data isn't required to perform analysis. At most, you will not be able to perform certain visualizations of your raw data.
This problem has been fixed in version 1.5 of FlexArray.
This is because R does not include control spots when listing probes using mm(), and FlexArray does include them in the data table.
Detection p-values are supposed to be a measure of trust that a particular feature has been detected for a given sample / array. The lower the value, the more likely it is that the signal for the feature is different from background.
For Illumina data, detection p-values are included in raw data files. For Affymetrix data they can be calculated in FlexArray using an algorithm.
Present / Absent Calls are made by applying simple thresholding rules to detection p-values. For Affymetrix data, any detection p-value that falls below 0.04 is assigned a Present call, and above 0.06 it is assigned an Absent call. Marginal calls are given to probe sets which have detection p-values between 0.04 and 0.06. For Illumina data, the detection threshold typically used is 0.05.
Of course, all such thresholds should be considered flexible, and can be adjusted if necessary. This is precisely why FlexArray does not work in terms of P/A Calls but rather in terms of detection p-values.
If you wish to use P/A Calls in your analysis, we suggest to use the workflow described in "New features.pdf" document included with the installation of the program.
This doesn't exist yet as an end-user feature, but it can often be performed by our staff. Please contact us if you need help with this.
Please check the floating-point number format in your data file. FlexArray assumes that dot is used as the floating-point separator. If your file contains comma as the floating-point separator, data will not be properly imported into the program. You should manually adjust the data file prior to importing it into the program.
Because of the greater variety of data file formats (and their versions) for one-color and two-color data, it is unfortunately possible that a particular data file will be rejected by the program. If this happens, please contact us.
Right-click on the "Raw data" node in the Analysis Pipeline window, and select from the context menu the normalization method you wish to apply.
It is possible for a certain class of algorithms, more specifically for Illumina normalization algorithms (lumi), and for all statistical tests. To do it, please highlight the features (rows) of interest in the data table, then run the algorithm, and check the "Use highlighted data table rows" box in the Run algorithm dialog box.
Please note however that this feature must be used with caution if you wish to apply an FDR algorithm to the results of the statistical test (which is a recommended practice). In such cases the choice of features to be filtered out must not be done by looking on results of a previously executed statistical test because such a procedure will create bias and will invalidate any adjusted p-values obtained.
In a typical microarray experiment, only a certain percentage of features (e.g. genes, targets) are found to be detected (often between 25% and 50%). It is therefore often a good idea to remove undetected (or absent) features from the analysis altogether.
In version 1.5, FlexArray offers an "algorithm" that flags all undetected features according to user's criteria. The use of this feature is described in detail in document "New features.pdf" included with your installation of FlexArray.
Please note that to use this feature you need to have so-called detection p-values for your array features. In the case of Illumina, they should be included in the raw data file. For Affymetrix arrays, detection p-values are calculated by an algorithm in FlexArray (P/A calls or MAS 5.0).
It must be however understood that depending on specific experimental considerations, the "filter" algorithm in FlexArray may not be adequate in some contexts.
Click on the little arrow (triangle) next to the "Raw data" node in the Analysis Pipeline. A list of samples will appear underneath it. Select the samples you wish to normalize together by a combination of Ctrl-clicks and Shift-clicks. Finally, right-click at the selection and choose the normalization method to be applied.
Bioconductor (the set of R packages that FlexArray uses to perform calculations) on Windows is notorious for memory problems. And then FlexArray itself adds to the problem because it too needs memory. This is especially a problem for Affymetrix data normalization because of the data size.
This problem can be circumvented in a number of ways:
This error message is typically followed by additional messages such as "Specified environment does not contain xxxx" or "Package xxxx not installed".
This error message means that the version of Bioconductor that you are using does not support a particular chip type. You have two options to normalize your data:
In FlexArray, the "standard" normalization algorithms (e.g. RMA or dChip) for Affymetrix data cannot be used to normalize Gene ST and Exon ST array data. Bioconductor now supports normalization using the Just RMA algorithm, but it results in missing probesets; for this reason we strongly discourage its use in version 1.6 of FlexArray and we have disabled it entirely in version 1.6.1.
It is therefore the APT "algorithm" that should be used with Gene ST and Exon ST data. This algorithm is simply a wrapper around the APT application (Affymetrix Power Tools) which needs to be downloaded and installed separately. The procedure is described in detail in document "New features.pdf" included with your installation of FlexArray.
Also, unless you wish to examine all raw data on probe level, it is strongly suggested to check the "Skip import of actual data" box on the Data Import dialog box in order to save memory. The document mentioned above describes how to use this feature as well.
To obtain these files from Affymetrix:
Please see document "New features.pdf" included with your installation of FlexArray for more details about using APT in FlexArray.
Please update your installation of FlexArray to version 1.6.1 where this problem is fixed, and then re-run your analysis.
Statistical tests require that the experimental design be properly set up. To do it you need to add at least one experimental factor. In the Experimental Design window click on the icon with the big green "+" sign (and a little square next to it), enter the factor name (e.g. "Time", "Genotype", "Condition" etc.), then in the table below insert as many levels as you need by clicking on the "+" icon. For every level, type its name directly in the table (e.g. "0h", "12h" etc.). Click OK. You can add up to two factors this way.
Finally, you need to assign levels to arrays. In the Experimental Design window, for each array select the proper level from the drop down menu (click at the small triangle to open the menu). You can also select several rows in the Experimental Design, and right-click on the selection to assign factor levels to arrays.
In the Analysis Pipeline, right-click on the node with normalized data, and select "Unavailable algorithms". A message box will appear that will provide you more information on which statistical tests are not available, and why.
Typically, the reason may be one of the following:
There is no answer to this question that would always be valid, and this is why FlexArray allows you to choose between many tests.
The regular t-test is often used but it is not optimal in the context of microarray data because it treats every gene separately where - due to few replicates - the variance estimation will not be very accurate. All the other statistical tests have been created to work better with microarray data - they typically try to estimate the variance in a more global fashion (across multiple genes). In particular, Bayesian tests (such as cyber-T) have been found to perform well with microarray data, and have gained wide acceptance in the scientific community.
Your options include seeking recommendation of a statistician, performing a review of the literature and choosing the method that is most frequently used in a particular field of study, and running a few different tests and choosing the genes common to all of them.
If the normalized data have been logged then the logged fold change is equal to the difference of means of expression levels between the two groups being compared: log(fc) = mean(group1) - mean(group2). Raw fold change is then equal to: fc = 2^log(fc).
If the normalized data have not been logged then it is first logged, and then the formulas above are used. This ensures that fold changes are the same regardless of whether normalized data is logged or not.
For ANOVA, fold changes will be calculated if there are exactly two levels in your factor. The other factor will be disregarded when calculating the sample mean. For instance, let's assume that you have two factors, each with two levels: Genotype = WT, Mut, and Treatment = Yes, No. In this case you have 4 groups in total: WT+Yes, WT+No, Mut+Yes, Mut+No. Calculating the fold change for Genotype will amount to: mean ([WT+Yes] + [WT+No]) - mean ([Mut+Yes] + [Mut+No]) - as if Treatment did not exist.
Version 1.4.1 of FlexArray, as well as all older versions, contained two bugs in the ANOVA algorithm. One was applicable to experimental designs with more than two levels in at least one factor, and the other one to situations when the user did not select all contrasts in an ANOVA with two factors.
Therefore, if in doubt, open your project in the most recent version of FlexArray, and re-run the ANOVA.
For some reason, Affymetrix Gene ST data normalized with APT contains several hundreds of control probesets. We have contacted Affymetrix about this, and the company does not offer a recommendation in terms of the desired way of handling such probesets. What we typically do is to exclude all control probesets from statistical testing, keeping only those who have the value "main" in the "category" annotation. To do this:
TREAT does not work properly in FlexArray with old versions of R. Please install a more recent version of R (ideally 2.12.0 or more recent).
You need to click at the "Import annotations" button in the toolbar of the data table. Then select the file, specify its format, choose the rows and columns to be imported, and click "Finish". It is important to properly indicate the "feature id" column in the file - it should contain feature ids, e.g. probe names or target names, to be used to match information from the file with data already present in the FlexArray project. Please see document "Import wizards.pdf" included with your installation of FlexArray for more details.
Generally, annotations should be in text files, comma- or tab-separated. The only exception to this rule is the option to import Illumina annotations directly in the BGX format, added to FlexArray in version 1.5.
If you performed normalization using Just RMA or using APT at gene level, you need to import the file named xxx.transcript.csv. If you performed normalization using APT at exon level, you need to import the file named xxx.probeset.csv.
This error will be displayed in two separate situations:
For more information about importing Illumina annotations please see document "Import wizards.pdf" included with your installation of FlexArray.
Also, please note that the Innovation Centre routinely adds the most up-to-date annotations directly to Illumina raw data files. To import them, you need to run the Import Wizard just as in other cases, but select the raw data file as the annotation file.
It is possible to import annotations for entities other than the feature ids of data in the program. Example: you may have some annotations for UniGene IDs, not Affymetrix probeset names. To import such annotations, first you need to import UniGene IDs from the standard Affymetrix annotation file for your type of array. Then you can import the other annotation file, by selecting UniGene ID as the column to match the annotations to (it is a dropdown in the Import Wizard for annotations).
First, in the Analysis Pipeline select the node that contains the data you wish to visualize. Then select the desired plot from the "Plot" drop-down in the Plot Viewer. For many plots you can also select the data source(s) in the two drop-downs underneath.
Some plots are only applicable to rows highlighted in the data table. In such cases, select the desired row or rows, and click the "Refresh plot" button in Plot Viewer's toolbar.
Run a statistical test on your data. The Volcano plot is the default plot for data nodes containing results of a statistical test, and for FDR nodes.
You need to be careful, though, to always work with the right type of volcano plot (there are two: volcano plot of p-values and of adjusted p-values). As you switch between nodes or add nodes to your selection in the Analysis Pipeline, the volcano plot of adjusted p-values may be reset to the one of raw p-values, or vice versa, potentially causing interpretation mistakes. This happens much less frequently in version 1.6 of FlexArray thanks to improvements in code.
This is caused by the fact that the data table does not contain the fold change column. To make it work, please add the node that contains your statistical test results to the selection in the Analysis Pipeline (by Ctrl-clicking it), and then click the "Create filter" button again.
On the X axis you have a log2 of your fold change. This means that the more a data point is located towards the left or the right, the bigger the fold change. And then you have a -log10 of the p value on the Y axis, meaning that the higher a data point is located, the smaller its associated p-value. Putting this together, data points in the top-left and top-right parts of the plot are the most interesting from the point of view of differential expression. They are marked with bigger brown diamonds.
The two parameter boxes underneath the plot represent thresholds to use when deciding which points are to be represented as the big diamonds. Changing any of these values, and pressing ENTER will move the blue lines on the plot, and include more or less data points in the region of interest (top-left and top-right section of the plot).
The "Create filter" button (top-right corner of the plot area) will apply the selected thresholds to the data table, creating a filter that only contains the probe sets of interest. This can then be used to create a gene list.
Normalize your data, and specify the experimental design. Then select the PCA plot from the "Plot" drop-down in the Plot Viewer. In FlexArray, PCA plots cannot be generated from raw data.
PCA is the abbreviation of Principal Components Analysis which, according to Wikipedia, is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. PCA can supply the user with a lower-dimensional picture of high-dimensional data such as microarray data.
In practice and when applied to microarray data, PCA plots are a great Quality Control tool. A point on the plot corresponds to a sample, and the more two points are far from each other, the more different are their gene expression patterns. Once you enter your experimental design into FlexArray, points belonging to different experimental groups will be colored differently. It makes it easier to spot outliers (samples far away from their group) and noisy data (when samples from different groups are mixed together).
A "bad" PCA plots is usually a pretty good indication of problems with experimental design or with the execution of the experiment. The PCA plot is not the oracle, though. It may very well happen that an experiment that has a suboptimal PCA plot still produces good results after running statistical tests. When in doubt, you should try to confirm PCA plot's conclusions using other graphs, and scatter plots in particular. Samples close to each other on the PCA plot should have scatter plots with little differences in expression levels, and vice versa.
You should also look carefully at the percentages listed in axis labels. They indicate what proportion of the global variance is explained along that particular axis. What is particularly useful in this context is to compare the percentages on the two main axes to the distribution of points on the plot. For example, if the X axis says 85%, and the Y axis says 8%, any differences in Y position of points are much less "important" than differences in X position. On the other hand, if the two numbers are 18% and 12%, respectively, any separation between groups found is much less strong.
Press the Control key on your keyboard, and then press the mouse button somewhere within the plot, and drag the mouse to select the area to be zoomed in. Later you can pan (scroll) the plot area by dragging it with the right mouse button. Click on the "Zoom reset" button in Plot Viewer's toolbar to reset the zoom to its original state.
Unfortunately, this feature is currently only supported on a limited number of plots, e.g. on the PCA plot.
This feature is supported on most plots, but not all of them. To select the desired points, either click near a data point, or drag a box around some points. To add points to the selection, press the Shift key, and then click or drag. To remove points from the selection, press the Alt key, and then click or drag.
This feature is not available to end-users. For certain plots, it may be possible to adjust the colors by modifying an R script. For other plots, it is not possible without modifying the source code of the application. Please contact us if you wish to discuss this matter further.
Gene lists can be created almost at any point of the analysis after the data normalization step. Select Apply a filter to the data table or select a number of rows, and then click on the "Gene list" button in the toolbar of the data table, and choose the "Create from selected" option. As an alternative, you may apply a filter to the data table, and then choose the option "Create from all".
As a result, a new node will be added to the Analysis Pipeline. This node only contains the selected genes. You can use such nodes to compare gene lists among themselves (see next question). When selected together with a regular data node, a gene list node will cause the data in the data node to be automatically filtered.
Select two or more gene lists in the Analysis Pipeline by Ctrl-clicking at them. If two or three gene lists are selected you will see a Venn diagram. If more are selected, you can only compare the lists using the data table. The CAT plot can also be used to compare gene lists.
In fact, you are obliged by the license agreement of FlexArray to refer to it in any publication that contains results obtained using the software.
Unfortunately, there is no peer-reviewed publication describing FlexArray. It has been our objective for a long time to write one, but other activities always get in the way.
In the license agreement, we propose the following way of citing FlexArray:
However, we realize that this may go against citation policies of certain journals or against the rules followed in your lab. However, you still must refer to FlexArray in your publication if you've used it in your research. What people have done is they either mention it in the acknowledgements section of their paper, or they simply mention the name of the application (at least) directly in the text. We hope that one of these ways works for you.
Currently, there is no formal manual. But there is a set of video tutorials that explain how to use the program. And this FAQ document, of course.
There are also some useful documents included in your installation ? most of them can be accessed via the Help menu, and some other ones can be found in the "doc" subfolder of the installation directory.
The easiest way to do this is to: