I have created some R code for you to go through that corresponds to the topics we covered today, specifically data preprocessing, data exploration, and estimation of cellular hetereogenity and batch effects.  I have focused on data manipulation with minfi for the sake of certain analyses we will be performing tomorrow. The below figure from the minfi manual provides an idea of the minfi workflow.

minfi workflow


The work sheet to go through is located here.  To run this code, I would suggest using a smaller subset of 12 samples, which you can specify by using the attached csv for your phenotype file. Below are some questions to think about and explore after you have gone through all the code on that page.



  1. Do the different preprocessing techniques have differing impacts on the distribution of methylation values?
    • Required knowledge: 
      • performing different preprocessing approaches
      • different methods to assess methylation distribution: MDS and density plots and probe type pots
  2. Choose a gene of interest, do the methylation values seem similar across that gene? Does the mean across samples vary at each site?
    • Required knowledge:
      • How to subset data to a specific gene
      • General Approach:
        • get mean across samples for each CpG
        • plot MAPINFO against methylation average across samples
        • BONUS: calculate averages separately for cases and controls
      • Advanced: plot methylation level for each indivdual at each site across the gene
  3. Considering one specific gene, compare the impact of two different preprocessing techniques on the methylation values. Do they change?
    • Required knowledge:
      • Similar to above question, but now considering two different processing approaches
      • Get average across samples, can plot average methylation across sites for each method against each other
  4. Are the cell proportions associated with disease status?
    • Required knowledge:
      • Estimating cell proportions using Houseman approach
      • Creating controls vs cases boxplots for each of the cell types
  5. Do the estimated SVA’s seem to be correlated with any of the estimated cell proportions? Does this seem to correspond to the ISVA results?
    • Required knowledge:
      • Estimate SVAs
      • Plot these SVAs against some of the variables in the pData frame
      • Identify significant ISVAs

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>