We would like to congratulate Dr. Mark Bayfield from York University, who won a 500$ credit for services offered by Génome Québec. We would like to thank everyone that filled out the annual survey.

Frequently asked questions

Questions related to bioinformatics

FlexArray



Installation
  1. What is the software delivery model of FlexArray?
  2. Can FlexArray run on a Mac, or a Linux system?
  3. When I ran FlexArray for the first time, the program prompted me to download R libraries. Is this normal?
  4. When I try to run FlexArray, I get error messages related to R, statconnDCOM, and/or COM objects. How do I fix this?
  5. How can I check the connection between FlexArray and R? How can I verify what version of R is used by FlexArray?
  6. I have a version of .NET Framework other than 2.0 already installed. Is there any way to install FlexArray without downgrading / upgrading the .NET Framework to version 2.0?
Answers - Questions related to bioinformatics
Installation
  1. What is the software delivery model of FlexArray?

    FlexArray is a desktop application, not a Web application. It needs to be installed on the user's machine.

  2. Can FlexArray run on a Mac, or a Linux system?

    Currently, FlexArray only runs on Windows, but a Mac / Linux version is feasible in the future if there is sufficient demand for it.

  3. When I ran FlexArray for the first time, the program prompted me to download R libraries. Is this normal?

    Yes. In order to perform calculations, FlexArray uses R, the open-source language for statistical computing (see the project web site). Calculating routines (algorithms) come in the form of R libraries (a.k.a. packages) that need to be downloaded and installed. Normally this is done only once - the first time you run FlexArray. If you experience difficulties with this procedure, please run FlexArray as an administrator.

  4. When I try to run FlexArray, I get error messages related to R, statconnDCOM, and/or COM objects. How do I fix this?

    In versions 1.5 and 1.6 of FlexArray we have spent a considerable amount of time making the installation more bullet-proof on recent versions of Windows. Here are things to consider or to verify when you are still having problems.

    1. First of all, please read the error or warning message carefully, and try to follow any instructions or suggestions it may contain.
    2. If you installed FlexArray using only the MSI installer, please try the full installer as it contains code that properly sets up FlexArray's connection to R.
    3. Make sure you have R version 2.12.0 or more recent installed. Using obsolete version of R may cause certain problems in FlexArray.

      • FlexArray's full installer comes with R version 2.12.1 included. Other versions can be downloaded from the following web site.
      • Please note that recent versions of statconnDCOM are not compatible with R versions earlier than 2.12.0.

    4. Make sure you have recent version of statconnDCOM installed. It is normally downloaded and installed automatically by the full installer, and it then appears under "statconn" in the Start| Programs menu.

      • In case you wish to reinstall statconnDCOM manually, here's the link (please make sure to download the package called statconnDCOM, not any other package!).

    5. Check to see if FlexArray can connect to R properly. The next question in this FAQ explains how to do it.
    6. If you get errors from statconnDCOM's Basic Test program, please make sure library (package) rscproxy is installed in the version of R you wish to use in FlexArray. Normally it is supposed to be installed by the full installer, but if it fails for some reason here is how to fix this:

      • Close FlexArray.
      • Run R as an administrator (very important!)
      • Enter the following command:
        install.packages("rscproxy", lib=.Library, dependencies=TRUE, repos="http://lib.stat.cmu.edu/R/CRAN")
      • Make sure no error messages are displayed.
      • Close R.

    7. If for some reason FlexArray still cannot connect to R, here is something else you may want to try:

      • Open the R installation folder of the version of R you wish to use with FlexArray.
      • Go into the "bin\i386" subdirectory.
      • Run a program called RSetReg.exe (this has to be done as an administrator: right-click the program's icon, and choose "Run As" from the context menu). This program tells statconnDCOM (and hence FlexArray) to start using this particular version of R.

    8. Finally, make sure that you don't have the French version of the .NET Framework version 2.0 installed. If you do, please uninstall it, and install the English version instead.
  5. How can I check the connection between FlexArray and R? How can I verify what version of R is used by FlexArray?

    1. Go to the Start menu and then to Programs --> statconn.
    2. An icon called "Server 01 - Basic Test" should be there. Run it.
    3. A window will open. Click "Start R".
    4. In the box below you should see sections called "Server information" and "Connector information". The former refers to the version of statconnDCOM in use, and the latter specifies which version of R is being used by statconnDCOM (and hence by FlexArray).
    5. If you see any error messages, please refer to the previous question in this FAQ for troubleshooting suggestions.
  6. I have a version of .NET Framework other than 2.0 already installed. Is there any way to install FlexArray without downgrading / upgrading the .NET Framework to version 2.0?

    Different versions of the .NET framework are designed to co-exist on the same machine. By installing 2.0 you won't be affecting the other installation(s) of the Framework, or any applications that depend on it/them.



Data analysis
  1. What is the general data analysis flow in FlexArray?
  2. How can I learn more about how to perform data analysis in FlexArray?
  3. I would like to import and analyze data from a non-supported technology, e.g. methylation or proteomics. How can I do this?
  4. I would like to compare datasets from different arrays or different technologies in FlexArray. How can I do this?
Answers - Questions related to bioinformatics
Data analysis
  1. What is the general data analysis flow in FlexArray?

    The general analysis flow is the following:

    • Import the raw data.
    • Perform QC (Quality Control) of the raw data by looking at plots.
    • Create an experimental design.
    • Normalize the raw data.
    • Perform QC of the normalized data by looking at plots.
    • Run one or more statistical tests.
    • Perform a False Discovery Rate correction if desired (it is recommended).
    • Analyze the results of the test(s) by looking at plots and filtering the data table.
    • Generate gene lists.
    • Compare gene lists.
    • Export gene lists for further mining in other software packages.
  2. How can I learn more about how to perform data analysis in FlexArray?

    A set of video tutorials is available.

    There are also some useful documents included in your installation - most of them can be accessed via the Help menu, and some other ones can be found in the "doc" subfolder of the installation directory. In particular, all new and changed features in versions 1.4, 1.5 and 1.6 are described in detail in a document called "New features.pdf" you will find there. You can also download this file from our web site: New-features-1.6.pdf.

  3. I would like to import and analyze data from from a non-supported technology, e.g. methylation or proteomics. How can I do this?

    First, you need to make sure that your data is normalized, and saved in a text file. Then, you need to use the "Import custom normalized data" option (Data menu). Once the data is imported into a FlexArray project, you can view it in the data table and plots, and run statistical tests on them, just like you would with a fully supported technology.

  4. I would like to compare datasets from different arrays or different technologies in FlexArray. How can I do this?

    It is possible to do this to some extent although it depends on what exactly you wish to do. This is normally done via the option of importing custom normalized data. So here's the outline of the procedure:

    1. Import and normalize data from one array type / technology.
    2. Export normalized data as text.
    3. Start a new project.
    4. Import and normalize data from the other array type / technology.
    5. Import the text file exported in step 3 into the project as Custom Normalized Data.
    6. As a result, you will have results from both datasets in one project.

    While this will make comparisons easier, it won't give you direct graphic comparisons of both array types / technologies, feature by feature. This is because FlexArray does not know how to map feature ids of one array type to the other. Currently, no mechanism exists in FlexArray to facilitate this. Your best option is to add an intermediate step performed in Excel or R or other software that will match feature ids between array types and resolve any issues due to duplication etc.



Data Import
  1. When I import Affymetrix data, FlexArray shows a warning message related to the CDF library file. Is this a problem?
  2. When I import Affymetrix data with a CDF file, FlexArray displays the MM/PM status of probes, but this information appears to be wrong.
  3. The numbers of MM probes in FlexArray and obtained using nrow(mm(abatch)) in R are different. Why?
  4. Is it possible to import detection p-values (or Present / Absent calls) along with custom normalized data?
  5. How should I interpret detection p-values? How are they related to Present / Absent Calls?
  6. I imported my Illumina data into FlexArray, but the data table contains empty columns, and plots are not displayed. What happened?
  7. I'm having problems importing my one-color or two-color data into FlexArray.
Answers - Questions related to bioinformatics
Data import
  1. When I import Affymetrix data, FlexArray shows a warning message related to the CDF library file. Is this a problem?

    If you have a valid CDF file for your chip type, make sure that you specify the folder name in which it is located when you import your Affymetrix data.

    But even if you don't have a CDF file it isn't really a big problem: the CDF file specified during the import of raw data isn't required to perform analysis. At most, you will not be able to perform certain visualizations of your raw data.

  2. When I import Affymetrix data with a CDF file, FlexArray displays the MM/PM status of probes, but this information appears to be wrong.

    This problem has been fixed in version 1.5 of FlexArray.

  3. The numbers of MM probes in FlexArray and obtained using nrow(mm(abatch)) in R are different. Why?

    This is because R does not include control spots when listing probes using mm(), and FlexArray does include them in the data table.

  4. How should I interpret detection p-values? How are they related to Present / Absent Calls?

    Detection p-values are supposed to be a measure of trust that a particular feature has been detected for a given sample / array. The lower the value, the more likely it is that the signal for the feature is different from background.

    For Illumina data, detection p-values are included in raw data files. For Affymetrix data they can be calculated in FlexArray using an algorithm.

    Present / Absent Calls are made by applying simple thresholding rules to detection p-values. For Affymetrix data, any detection p-value that falls below 0.04 is assigned a Present call, and above 0.06 it is assigned an Absent call. Marginal calls are given to probe sets which have detection p-values between 0.04 and 0.06. For Illumina data, the detection threshold typically used is 0.05.

    Of course, all such thresholds should be considered flexible, and can be adjusted if necessary. This is precisely why FlexArray does not work in terms of P/A Calls but rather in terms of detection p-values.

    If you wish to use P/A Calls in your analysis, we suggest to use the workflow described in "New features.pdf" document included with the installation of the program.

  5. Is it possible to import detection p-values (or Present / Absent calls) along with custom normalized data?

    This doesn't exist yet as an end-user feature, but it can often be performed by our staff. Please contact us if you need help with this.

  6. I imported my Illumina data into FlexArray, but the data table contains empty columns, and plots are not displayed. What happened?

    Please check the floating-point number format in your data file. FlexArray assumes that dot is used as the floating-point separator. If your file contains comma as the floating-point separator, data will not be properly imported into the program. You should manually adjust the data file prior to importing it into the program.

  7. I'm having problems importing my one-color or two-color data into FlexArray.

    Because of the greater variety of data file formats (and their versions) for one-color and two-color data, it is unfortunately possible that a particular data file will be rejected by the program. If this happens, please contact us.



Data normalization
  1. I imported my raw data into the program. How do I normalize it?
  2. How can I perform my analysis on a subset of genes / probe sets only?
  3. How can I remove undetected / absent features from my analysis?
  4. How can I perform my analysis on a subset of samples only?
  5. When I try to normalize my Affymetrix data I get an error that says "cannot allocate vector". How can I fix this?
  6. When I try to normalize my Affymetrix data I get an error that says "could not obtain CDF environment". How can I fix this?
  7. How do I normalize Affymetrix Gene ST or Exon ST data in FlexArray?
  8. I'm trying to normalize my Affymetrix data using APT. Where can I download the required library files?
  9. RG values calculated for two-color data are incorrect. What is the problem?
Answers - Questions related to bioinformatics
Data normalization
  1. I imported my raw data into the program. How do I normalize it?

    Right-click on the "Raw data" node in the Analysis Pipeline window, and select from the context menu the normalization method you wish to apply.

  2. How can I perform my analysis on a subset of genes / probe sets only?

    It is possible for a certain class of algorithms, more specifically for Illumina normalization algorithms (lumi), and for all statistical tests. To do it, please highlight the features (rows) of interest in the data table, then run the algorithm, and check the "Use highlighted data table rows" box in the Run algorithm dialog box.

    Please note however that this feature must be used with caution if you wish to apply an FDR algorithm to the results of the statistical test (which is a recommended practice). In such cases the choice of features to be filtered out must not be done by looking on results of a previously executed statistical test because such a procedure will create bias and will invalidate any adjusted p-values obtained.

  3. How can I remove undetected / absent features from my analysis?

    In a typical microarray experiment, only a certain percentage of features (e.g. genes, targets) are found to be detected (often between 25% and 50%). It is therefore often a good idea to remove undetected (or absent) features from the analysis altogether.

    In version 1.5, FlexArray offers an "algorithm" that flags all undetected features according to user's criteria. The use of this feature is described in detail in document "New features.pdf" included with your installation of FlexArray.

    Please note that to use this feature you need to have so-called detection p-values for your array features. In the case of Illumina, they should be included in the raw data file. For Affymetrix arrays, detection p-values are calculated by an algorithm in FlexArray (P/A calls or MAS 5.0).

    It must be however understood that depending on specific experimental considerations, the "filter" algorithm in FlexArray may not be adequate in some contexts.

  4. How can I perform my analysis on a subset of samples only?

    Click on the little arrow (triangle) next to the "Raw data" node in the Analysis Pipeline. A list of samples will appear underneath it. Select the samples you wish to normalize together by a combination of Ctrl-clicks and Shift-clicks. Finally, right-click at the selection and choose the normalization method to be applied.

  5. When I try to normalize my Affymetrix data I get an error that says "cannot allocate vector". How can I fix this?

    Bioconductor (the set of R packages that FlexArray uses to perform calculations) on Windows is notorious for memory problems. And then FlexArray itself adds to the problem because it too needs memory. This is especially a problem for Affymetrix data normalization because of the data size.

    This problem can be circumvented in a number of ways:

    • If you are a Nanuq user, you can have your data normalized on our servers.
    • If not, use the Just RMA algorithm which has been optimized for memory use. RMA is a recognized way of normalizing data, and Just RMA is its implementation that does not suffer from this memory problem unless you have less than 512 MB of system memory. You can also use Just MAS 5 for normalizing larger data sets.
    • You can use Affymetrix Power Tools to perform your analysis. APT is available in FlexArray as one of the normalization algorithms. Please see document "New features.pdf" included with your installation of FlexArray for more details.
    • You can try doing the analysis on a computer with more RAM. A 64-bit version of Windows might also help.
    • If you really want to use one of the other algorithms, you will need to perform the normalization outside of FlexArray (e.g. in the free Expression Console from Affymetrix, or in R on a more powerful Linux machine), and then import the results into the program using the "Import custom normalized data" option.
  6. When I try to normalize my Affymetrix data I get an error that says "Could not obtain CDF environment". How can I fix this?

    This error message is typically followed by additional messages such as "Specified environment does not contain xxxx" or "Package xxxx not installed".

    This error message means that the version of Bioconductor that you are using does not support a particular chip type. You have two options to normalize your data:

    • Update R and statconnDCOM to their most recent versions, and try again to run your analysis.
    • Use the APT "algorithm" to normalize your data (see answers below). Please see document "New features.pdf" included with your installation of FlexArray for more details about using APT in FlexArray.
  7. How do I normalize Affymetrix Gene ST or Exon ST data in FlexArray?

    In FlexArray, the "standard" normalization algorithms (e.g. RMA or dChip) for Affymetrix data cannot be used to normalize Gene ST and Exon ST array data. Bioconductor now supports normalization using the Just RMA algorithm, but it results in missing probesets; for this reason we strongly discourage its use in version 1.6 of FlexArray and we have disabled it entirely in version 1.6.1.

    It is therefore the APT "algorithm" that should be used with Gene ST and Exon ST data. This algorithm is simply a wrapper around the APT application (Affymetrix Power Tools) which needs to be downloaded and installed separately. The procedure is described in detail in document "New features.pdf" included with your installation of FlexArray.

    Also, unless you wish to examine all raw data on probe level, it is strongly suggested to check the "Skip import of actual data" box on the Data Import dialog box in order to save memory. The document mentioned above describes how to use this feature as well.

  8. I'm trying to normalize my Affymetrix data using APT. Where can I download the required library files?

    To obtain these files from Affymetrix:

    • Go to http://www.affymetrix.com.
    • At the top of the page, click Support, and then Affymetrix Microarray Solutions.
    • Select your appropriate chip from the drop-down, and check boxes "Annotation Files" and "Library Files" underneath.
    • Click Go to download the file(s).

    Please see document "New features.pdf" included with your installation of FlexArray for more details about using APT in FlexArray.

  9. RG values calculated for two-color data are incorrect. What is the problem?

    Please update your installation of FlexArray to version 1.6.1 where this problem is fixed, and then re-run your analysis.



Experimental design & Statistical tests
  1. I normalized my raw data but I am unable to run any statistical tests. Why?
  2. I created an experimental design but I'm still unable to run any statistical tests. Why?
  3. Which statistical test should I choose for my data?
  4. How does FlexArray calculate fold changes?
  5. I am getting strange results from the ANOVA algorithm. What is going on?
  6. I have run a statistical test on Affymetrix Gene ST array data. A lot of differentially expressed transcripts seem to be controls (e.g. "intronic normalization controls"). What is going on?
  7. I am trying to run limma (TREAT) on my two-color data, and I get the following error message: "Size of 'data' slot does not match the length of 'dataKind' slot". How can I fix this?
Answers - Questions related to bioinformatics
Experimental design & Statistical tests
  1. I normalized my raw data but I am unable to run any statistical tests. Why?

    Statistical tests require that the experimental design be properly set up. To do it you need to add at least one experimental factor. In the Experimental Design window click on the icon with the big green "+" sign (and a little square next to it), enter the factor name (e.g. "Time", "Genotype", "Condition" etc.), then in the table below insert as many levels as you need by clicking on the "+" icon. For every level, type its name directly in the table (e.g. "0h", "12h" etc.). Click OK. You can add up to two factors this way.

    Finally, you need to assign levels to arrays. In the Experimental Design window, for each array select the proper level from the drop down menu (click at the small triangle to open the menu). You can also select several rows in the Experimental Design, and right-click on the selection to assign factor levels to arrays.

  2. I created an experimental design but I'm still unable to run any statistical tests. Why?

    In the Analysis Pipeline, right-click on the node with normalized data, and select "Unavailable algorithms". A message box will appear that will provide you more information on which statistical tests are not available, and why.

    Typically, the reason may be one of the following:

    • Your replicates are not set up properly. You should not assign a separate factor level to every replicate. FlexArray will treat as replicates any samples that have been assigned the same level across all experimental factors.
    • You need at least two samples per group (i.e. two replicates) in order to run statistical tests. Samples marked with "N/A" in the Experimental Design window are ignored.
    • The normalized data node contains data for more than two experimental groups. To compare individual groups you need to click on the little arrow (triangle) next to the normalized data node in the Analysis Pipeline. A tree of experimental groups will appear underneath it. Select the groups you wish to compare by clicking at the first group, and then Ctrl-clicking at the second one. Right-click at one of the selected groups, and choose the statistical test to be applied.
  3. Which statistical test should I choose for my data?

    There is no answer to this question that would always be valid, and this is why FlexArray allows you to choose between many tests.

    The regular t-test is often used but it is not optimal in the context of microarray data because it treats every gene separately where - due to few replicates - the variance estimation will not be very accurate. All the other statistical tests have been created to work better with microarray data - they typically try to estimate the variance in a more global fashion (across multiple genes). In particular, Bayesian tests (such as cyber-T) have been found to perform well with microarray data, and have gained wide acceptance in the scientific community.

    Your options include seeking recommendation of a statistician, performing a review of the literature and choosing the method that is most frequently used in a particular field of study, and running a few different tests and choosing the genes common to all of them.

  4. How does FlexArray calculate fold changes?

    If the normalized data have been logged then the logged fold change is equal to the difference of means of expression levels between the two groups being compared: log(fc) = mean(group1) - mean(group2). Raw fold change is then equal to: fc = 2^log(fc).

    If the normalized data have not been logged then it is first logged, and then the formulas above are used. This ensures that fold changes are the same regardless of whether normalized data is logged or not.

    For ANOVA, fold changes will be calculated if there are exactly two levels in your factor. The other factor will be disregarded when calculating the sample mean. For instance, let's assume that you have two factors, each with two levels: Genotype = WT, Mut, and Treatment = Yes, No. In this case you have 4 groups in total: WT+Yes, WT+No, Mut+Yes, Mut+No. Calculating the fold change for Genotype will amount to: mean ([WT+Yes] + [WT+No]) - mean ([Mut+Yes] + [Mut+No]) - as if Treatment did not exist.

  5. I am getting strange results from the ANOVA algorithm. What is going on?

    Version 1.4.1 of FlexArray, as well as all older versions, contained two bugs in the ANOVA algorithm. One was applicable to experimental designs with more than two levels in at least one factor, and the other one to situations when the user did not select all contrasts in an ANOVA with two factors.

    Therefore, if in doubt, open your project in the most recent version of FlexArray, and re-run the ANOVA.

  6. I have run a statistical test on Affymetrix Gene ST array data. A lot of differentially expressed transcripts seem to be controls (e.g. "intronic normalization controls"). What is going on?

    For some reason, Affymetrix Gene ST data normalized with APT contains several hundreds of control probesets. We have contacted Affymetrix about this, and the company does not offer a recommendation in terms of the desired way of handling such probesets. What we typically do is to exclude all control probesets from statistical testing, keeping only those who have the value "main" in the "category" annotation. To do this:

    • Import annotations for your particular chip type and normalization level (gene or exon; see below). Make sure you import the 'category' column.
    • Create a custom filter on the 'category' column in the data table. Set the filter to be "equals 'main'" (no quotes).
    • Based on this filter create a gene list of all 'main' probesets.
    • Select both the gene list and the normalized data node in the Analysis Pipeline.
    • Highlight all rows in the data table (Ctrl+A).
    • When you run the statistical test, make sure you have the "Use highlighted data table rows" box checked.
  7. I am trying to run limma (TREAT) on my two-color data, and I get the following error message: "Size of 'data' slot does not match the length of 'dataKind' slot". How can I fix this?

    TREAT does not work properly in FlexArray with old versions of R. Please install a more recent version of R (ideally 2.12.0 or more recent).



Annotations
  1. How can I import annotations?
  2. What file formats are supported for annotations?
  3. What annotations should I download and import for Affymetrix Gene ST array data?
  4. When I try to import annotations, I get the following error message: "Annotations retrieved from the file did not match the data in the table". How can I fix this?
  5. I have annotations for a different entity than the one displayed as feature id in FlexArray. How can I import them?
Answers - Questions related to bioinformatics
Annotations
  1. How can I import annotations?

    You need to click at the "Import annotations" button in the toolbar of the data table. Then select the file, specify its format, choose the rows and columns to be imported, and click "Finish". It is important to properly indicate the "feature id" column in the file - it should contain feature ids, e.g. probe names or target names, to be used to match information from the file with data already present in the FlexArray project. Please see document "Import wizards.pdf" included with your installation of FlexArray for more details.

  2. What file formats are supported for annotations?

    Generally, annotations should be in text files, comma- or tab-separated. The only exception to this rule is the option to import Illumina annotations directly in the BGX format, added to FlexArray in version 1.5.

  3. What annotations should I download and import for Affymetrix Gene ST array data?

    If you performed normalization using Just RMA or using APT at gene level, you need to import the file named xxx.transcript.csv. If you performed normalization using APT at exon level, you need to import the file named xxx.probeset.csv.

  4. When I try to import annotations, I get the following error message: "Annotations retrieved from the file did not match the data in the table". How can I fix this?

    This error will be displayed in two separate situations:

    • When the annotations file does not match your data at all. In this case the number of mismatched rows indicated will be large.
      • Make sure the file contains annotations for the same chip type as your data.
      • Make sure the format specification matches the file format (usually CSV).
        • Look carefully at the file preview box. Make sure the columns are split properly, and that there are no quotes surrounding entries. Adjust the file format if you see anything weird.
        • In the Columns page of the wizard, one of the columns in the preview box should be selected with the right-click. This column becomes blue, and it will be used by FlexArray to match annotations with data already present in your analysis project. Make sure you select the column containing probe set names (Affymetrix data) or target ids / probe ids (Illumina data).
        • Please note that starting from version 1.5, FlexArray will automatically locate the header row, and choose the appropriate feature id column in annotation files for Affymetrix and Illumina data. Please see document "New features.pdf" included with your installation of FlexArray for more details.
    • When the annotations file does in fact match your data but not perfectly. In this case the number of mismatched rows indicated will be much smaller than the number of features on the array.
      • This error may occur if the annotations match another version of the chip.
      • With some array types, in particular with Affymetrix Gene ST arrays, this error occurs even if the annotations match the chip version. As far as we have been able to establish, this error does not indicate the presence of a problem.

    For more information about importing Illumina annotations please see document "Import wizards.pdf" included with your installation of FlexArray.

    Also, please note that the Innovation Centre routinely adds the most up-to-date annotations directly to Illumina raw data files. To import them, you need to run the Import Wizard just as in other cases, but select the raw data file as the annotation file.

  5. I have annotations for a different entity than the one displayed as feature id in FlexArray. How can I import them?

    It is possible to import annotations for entities other than the feature ids of data in the program. Example: you may have some annotations for UniGene IDs, not Affymetrix probeset names. To import such annotations, first you need to import UniGene IDs from the standard Affymetrix annotation file for your type of array. Then you can import the other annotation file, by selecting UniGene ID as the column to match the annotations to (it is a dropdown in the Import Wizard for annotations).



Plots
  1. How can I access other plot types?
  2. How can I obtain a Volcano plot?
  3. When I click the "Create filter" button on a volcano plot of adjusted p-values (after FDR correction), an error message is displayed, and no filter is applied to the data table. How can I make it work?
  4. How should I interpret a Volcano plot?
  5. How can I obtain a PCA plot?
  6. How can I interpret a PCA plot?
  7. How can I zoom in a plot?
  8. How can I make FlexArray display labels (probe set names, gene names, etc.) next to data points in plots?
  9. How can I select some data points in a plot, and see their associated values in the data table?
  10. Can I change the colors displayed in plots?
Answers - Questions related to bioinformatics
Plots
  1. How can I access other plot types?

    First, in the Analysis Pipeline select the node that contains the data you wish to visualize. Then select the desired plot from the "Plot" drop-down in the Plot Viewer. For many plots you can also select the data source(s) in the two drop-downs underneath.

    Some plots are only applicable to rows highlighted in the data table. In such cases, select the desired row or rows, and click the "Refresh plot" button in Plot Viewer's toolbar.

  2. How can I obtain a Volcano plot?

    Run a statistical test on your data. The Volcano plot is the default plot for data nodes containing results of a statistical test, and for FDR nodes.

    You need to be careful, though, to always work with the right type of volcano plot (there are two: volcano plot of p-values and of adjusted p-values). As you switch between nodes or add nodes to your selection in the Analysis Pipeline, the volcano plot of adjusted p-values may be reset to the one of raw p-values, or vice versa, potentially causing interpretation mistakes. This happens much less frequently in version 1.6 of FlexArray thanks to improvements in code.

  3. When I click the "Create filter" button on a volcano plot of adjusted p-values (after FDR correction), an error message is displayed, and no filter is applied to the data table. How can I make it work?

    This is caused by the fact that the data table does not contain the fold change column. To make it work, please add the node that contains your statistical test results to the selection in the Analysis Pipeline (by Ctrl-clicking it), and then click the "Create filter" button again.

  4. How should I interpret a Volcano plot?

    On the X axis you have a log2 of your fold change. This means that the more a data point is located towards the left or the right, the bigger the fold change. And then you have a -log10 of the p value on the Y axis, meaning that the higher a data point is located, the smaller its associated p-value. Putting this together, data points in the top-left and top-right parts of the plot are the most interesting from the point of view of differential expression. They are marked with bigger brown diamonds.

    The two parameter boxes underneath the plot represent thresholds to use when deciding which points are to be represented as the big diamonds. Changing any of these values, and pressing ENTER will move the blue lines on the plot, and include more or less data points in the region of interest (top-left and top-right section of the plot).

    The "Create filter" button (top-right corner of the plot area) will apply the selected thresholds to the data table, creating a filter that only contains the probe sets of interest. This can then be used to create a gene list.

  5. How can I obtain a PCA plot?

    Normalize your data, and specify the experimental design. Then select the PCA plot from the "Plot" drop-down in the Plot Viewer. In FlexArray, PCA plots cannot be generated from raw data.

  6. How can I interpret a PCA plot?

    PCA is the abbreviation of Principal Components Analysis which, according to Wikipedia, is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. PCA can supply the user with a lower-dimensional picture of high-dimensional data such as microarray data.

    In practice and when applied to microarray data, PCA plots are a great Quality Control tool. A point on the plot corresponds to a sample, and the more two points are far from each other, the more different are their gene expression patterns. Once you enter your experimental design into FlexArray, points belonging to different experimental groups will be colored differently. It makes it easier to spot outliers (samples far away from their group) and noisy data (when samples from different groups are mixed together).

    A "bad" PCA plots is usually a pretty good indication of problems with experimental design or with the execution of the experiment. The PCA plot is not the oracle, though. It may very well happen that an experiment that has a suboptimal PCA plot still produces good results after running statistical tests. When in doubt, you should try to confirm PCA plot's conclusions using other graphs, and scatter plots in particular. Samples close to each other on the PCA plot should have scatter plots with little differences in expression levels, and vice versa.

    You should also look carefully at the percentages listed in axis labels. They indicate what proportion of the global variance is explained along that particular axis. What is particularly useful in this context is to compare the percentages on the two main axes to the distribution of points on the plot. For example, if the X axis says 85%, and the Y axis says 8%, any differences in Y position of points are much less "important" than differences in X position. On the other hand, if the two numbers are 18% and 12%, respectively, any separation between groups found is much less strong.

  7. How can I zoom in a plot?

    Press the Control key on your keyboard, and then press the mouse button somewhere within the plot, and drag the mouse to select the area to be zoomed in. Later you can pan (scroll) the plot area by dragging it with the right mouse button. Click on the "Zoom reset" button in Plot Viewer's toolbar to reset the zoom to its original state.

  8. How can I make FlexArray display labels (probe set names, gene names, etc.) next to data points in plots?

    Unfortunately, this feature is currently only supported on a limited number of plots, e.g. on the PCA plot.

  9. How can I select some data points in a plot, and see their associated values in the data table?

    This feature is supported on most plots, but not all of them. To select the desired points, either click near a data point, or drag a box around some points. To add points to the selection, press the Shift key, and then click or drag. To remove points from the selection, press the Alt key, and then click or drag.

  10. Can I change the colors displayed in plots?

    This feature is not available to end-users. For certain plots, it may be possible to adjust the colors by modifying an R script. For other plots, it is not possible without modifying the source code of the application. Please contact us if you wish to discuss this matter further.



Gene lists
  1. How can I create a gene list in FlexArray?
  2. How can I compare multiple gene lists?
  3. How can I visualize genes from my list on plots, e.g. on scatter plots?
  4. How can I make a gene list based on gene ontology assignments?
Answers - Questions related to bioinformatics
Gene lists
  1. How can I create a gene list in FlexArray?

    Gene lists can be created almost at any point of the analysis after the data normalization step. Select Apply a filter to the data table or select a number of rows, and then click on the "Gene list" button in the toolbar of the data table, and choose the "Create from selected" option. As an alternative, you may apply a filter to the data table, and then choose the option "Create from all".

    As a result, a new node will be added to the Analysis Pipeline. This node only contains the selected genes. You can use such nodes to compare gene lists among themselves (see next question). When selected together with a regular data node, a gene list node will cause the data in the data node to be automatically filtered.

  2. How can I compare multiple gene lists?

    Select two or more gene lists in the Analysis Pipeline by Ctrl-clicking at them. If two or three gene lists are selected you will see a Venn diagram. If more are selected, you can only compare the lists using the data table. The CAT plot can also be used to compare gene lists.

  3. How can I visualize genes from my list on plots, e.g. on scatter plots?

    • Select the gene list node at the same time as the node with data that you wish to visualize.
    • Clear all filters from the data table. The table will now display only data corresponding to genes from your list.
    • Highlight all rows in the data table by pressing Ctrl+A.
    • The corresponding data points will become highlighted on the plot.

  4. How can I make a gene list based on gene ontology assignments?

    • Import annotations containing gene ontology assignments, or other information you wish to use as the basis for your list.
    • Create a custom filter on the appropriate annotation column in the data table:
      • Click the triangle in column's header.
      • Select "(Custom)".
      • Select the "like" operator from the drop-down on the left.
      • Enter the searched keyword in the edit box on the right, adding the '%' character (a wildcard, similar to '*' in Windows file names) before and after, like this: %inflammation%.
      • Click OK.
    • A filter will be applied to the data table containing only features that contain matching text in the selected annotation column.
    • It is unfortunately impossible at this time to create such filters that will span multiple columns. If you create custom filters on more than one column, the AND operation (conjunction) will be applied to results which is not usually desired.



Other
  1. What is the proper way of citing FlexArray?
  2. Is there a manual or a searchable Help tool?
  3. How can I export data directly from FlexArray to an Excel file?
Answers - Questions related to bioinformatics
Other
  1. What is the proper way of citing FlexArray?

    In fact, you are obliged by the license agreement of FlexArray to refer to it in any publication that contains results obtained using the software.

    Unfortunately, there is no peer-reviewed publication describing FlexArray. It has been our objective for a long time to write one, but other activities always get in the way.

    In the license agreement, we propose the following way of citing FlexArray:

    Michal Blazejczyk, Mathieu Miron, Robert Nadon (2007). FlexArray: A statistical data analysis software for gene expression microarrays. Génome Québec, Montreal, Canada, URL http://genomequebec.mcgill.ca/FlexArray

    However, we realize that this may go against citation policies of certain journals or against the rules followed in your lab. However, you still must refer to FlexArray in your publication if you've used it in your research. What people have done is they either mention it in the acknowledgements section of their paper, or they simply mention the name of the application (at least) directly in the text. We hope that one of these ways works for you.

  2. Is there a manual or a searchable Help tool?

    Currently, there is no formal manual. But there is a set of video tutorials that explain how to use the program. And this FAQ document, of course.

    There are also some useful documents included in your installation ? most of them can be accessed via the Help menu, and some other ones can be found in the "doc" subfolder of the installation directory.

  3. How can I export data directly from FlexArray to an Excel file?

    The easiest way to do this is to:

    • Organize the data table as you wish by selecting one or more nodes in the Data Analysis Pipeline, and then moving bands or columns around.
    • If desired, highlight only certain rows in the data table (or apply one or more gene lists by selecting them in the pipeline).
    • Click the "Export data" icon in the data table's tool bar.
    • Choose "Export to text".
    • Choose between exporting all rows or selected rows only.
    • Choose the data columns you wish to be exported (by default, all data columns from the data table are selected for export).
    • At the bottom of the dialog box, choose the Clipboard as the destination.
    • Click OK.
    • Switch to Excel, open a new worksheet, and paste the contents of the Clipboard.