Tag Archive | DNA methylation

Methylumi data import

I cannot believe how long this took to figure out. Guess this happens when you haven’t used it for ages, you are too lazy to re-read the manual and you give up everytime you hit the same problem. (In my defense… I gave up because I had a perfectly working alternative, albeit longer, script that did the exact same thing.) So why should I spend time fixing it?….. Because I won’t need to make little changes to the script everytime my data is slightly different which equals = SAVE TIME IN THE LONG RUN!

This is just a small tip on what is needed in order to input files using the methylumi package from Bioconductor to input Illumina Infinium HumanMethylation450 data into R. This also works for HumanMethylation27 data as well. I kept hitting these two errors when using the methylumiR function so I’m writing down what happened and how to solve this as a reminder for next time. The methylumiR function allows you to create a methyLumiSet object which contains slots for the following:

  • phenoData – variables that describe the samples
  • featureData – description of each probe
  • exprs – average beta values which corresponds to the percentage of DNA methylation at each CpG site
  • pvals – detection P values
  • unmethylated – signalA intensities which corresponds to the unmethylated intensity
  • methylated – signalB intensities which corresponds to the methylated intensity

Steps:
1) In genomeStudio, export the entire “Sample Methylation Profile” data set. Include all columns (including all the probe annotation columns); For subcolumns, only “AVG_Beta”, “Signal_A”,”Signal_B” and “Detection Pval” are necessary. Let’s call this the “all.txt”.

2) Export the “Samples Table”. This isn’t actually necessary, but I like to use this since it has the “Sentrix Barcode” and “Sample Section” information. This also makes sure my sampleID is in the same order as my all.txt file. Let’s call this one “sample.txt”.

This is usually where I stop and I had the following error:
> samp <- read.table(“sample.txt”,sep=”\t”,na.strings=c(“”,” “,”NA”),as.is=T,header=T,row.names=1)
> DATA <- methylumiR(“all(test=1000).txt”,sampleDescriptions=samp)
Error in if (!sampleIDcol) { : argument is of length zero

So the problem is that the “sample.txt” file needs to have a column with the name of “SampleID”.

After fixing this, the second most common problem I had is the following:
Error in if (labelCol) { : argument is of length zero

This problem is because there needs to be a “SampleLabel” column. By adding the two now I’m good. So here’s step 3 and 4 of using methylumiR.

3) Change the “Sample ID” column name into “SampleID”. Also duplicate the column and name it as “SampleLabel”. Otherwise, add a “SampleLabel” column and indicate the name you want to use as sample name in the methyLumiSet object.

[Optional]
3b) Include extra columns in the “sample.txt” file. Any extra columns will be deposited into the phenoData and can be accessed using pData(DATA).

4) Use the following script:
> samp <- read.table(“sample.txt”,sep=”\t”,na.strings=c(“”,” “,”NA”),as.is=T,header=T,row.names=1)
> DATA <- methylumiR(“all(test=1000).txt”,sampleDescriptions=samp)

This should generate a methyLumiSet object where the average beta values can be accessed by using betas(DATA).

No more extreme long scripts that is replaceable by this awesome function!
If you have any tips to share or questions, please feel free to leave a comment!