Logfold change (Limma Package) in methylation association studies Entering edit mode 3.0 years ago ritz ▴ 10 Hi all, I need some advice. I am doing an analysis to look for an association between methylation and stress. I use lmfit model. My stress variable is continuous and I am not comparing any stress or control groups. An example of my code:- design <- model.matrix(~ stress + age + gender + smoking + Plate , pheno) sv <- sva(getM(eset.nosex), design, vfilter = 5e4) ##surrogate varaiblesdesign <- cbind(design, sv$sv) fit <- lmFit(getM(eset.nosex), design)fit <- eBayes(fit) names(fit) topTable(fit, 2) How would one interpret log fold change in terms of methylation (especially when there is no group comparison and only an association study)? Arent estimates(coefficients) a better measurement for methylation in association studies? thankyou! limma logfoldchange • 1.8k views ADD COMMENT • link updated 4 hours ago by Papyrus ★ 2.9k • written 3.0 years ago by ritz ▴ 10 Entering edit mode I think the Nonetheless, even without getting to the logFCs, because you're fitting the models in M-values (which is the usually recommended method), even the coefficient values will be difficult to interpret biologically. It may be better to interpret/filter the CpGs using the beta values (or even covariate-adjusted beta values). ADD COMMENT • link 3.0 years ago by Papyrus ★ 2.9k Entering edit mode Thank you for your reply. Do you mean the coefficients or the estimates? Yes, I did check fit$coefficients. It gives coefficients for each variable in the design matrix. head(fit$coefficients) cg14817997 1.3965045 0.02149242 0.052808421 0.02321043cg26928153 2.3554372 -0.02522342 -0.036111524 0.04056885cg16269199 0.0455470 0.02982565 -0.114791997 0.06911121cg13869341 0.2278520 0.05470675 -0.187123556 0.07042281cg14008030 0.2342564 0.07319959 0.001113212 0.02870625cg12045430 -3.7704926 -0.06934915 0.168677371 -0.03177391 Smoking Plate2cg14817997 -0.16372064 0.6785156 cg26928153 -0.04224105 -0.5071137 cg16269199 -0.05919517 -0.8600021 cg13869341 -0.32086356 0.6786138 cg14008030 -0.23856418 2.3189791 cg12045430 0.23110099 -1.7975213 But what I would like to know is just for the variable of interest:- stress topTable(fit2, 2) cg13547817 -0.3738731 3.5772747 -5.624865 1.955557e-07 0.1581173 5.915087cg16692227 -0.3559524 0.9775661 -5.455437 4.043169e-07 0.1634560 5.334972cg03287633 -0.1867909 3.0880473 -5.290847 8.107974e-07 0.1738228 4.778687cg17474383 -0.2511642 -0.0911020 -5.226623 1.060811e-06 0.1738228 4.563692cg21858516 -0.2169699 -0.7935851 -5.223461 1.074899e-06 0.1738228 4.553137cg16294325 -0.2948937 3.3952271 -5.110209 1.719679e-06 0.2317423 4.177111 Usually, when you run a normal lm model (without lmfit) we get estimates, std. error, t value, and pvalue. Which is pretty easy to interpret. Yes, true beta values are better for biological interpretation, but after running the model in lmfit (using m values) I can convert to beta values and that should work right? meth_beta = topTable(fit2, 2 , sort.by="p", number = nrow(getM(eset.nosex))) thanks again. ADD REPLY • link 3.0 years ago by ritz ▴ 10 Entering edit mode Yes, I meant that, if you fit a continuous covariate (e.g. stress), the (I just did a Unless we're using different terminology, I was referring to the coefficients/estimates as the same thing (they represent the mean of change in Y for unit of change in X, thus being a measure of effect size; if you had factors, they would be the means or changes in means between groups). Regarding going "back" from M-values to b-values, I don't think you can convert coefficients in M-value scale to coefficients in b-value scale, because the M- and b-value relationship is non-linear. For example, a change from 0 to 2 in M-value can be a 30% change in beta, while from 2 to 4, the beta change is much less (see Fig. 1 here). I would rather directly use the original beta values. Maybe you could also estimate the coefficients using beta values (with ADD REPLY • link 3.0 years ago by Papyrus ★ 2.9k Entering edit mode Thank you! Yes, I did check the $logfc measurements and fir$coefficients and they are the same. That helps a lot. yes, we cannot change the m-value coefficients to coefficients on the b-value scale. I did read the paper and they too advise using m-values for the analysis and b-values for the biological interpretation. I do use the beta values for plotting. If we need b-value-based coefficients then that would mean running the analysis with b-values and that doesn't make sense. It like a loop. But thanks a ton, the logfc was a great help. ADD REPLY • link 3.0 years ago by ritz ▴ 10 Entering edit mode Yes, I would also do everything on M-values. The reason I suggested also fitting the analyses on b-values is because: 1) with lmFit you have a very fast way of obtaining stress coefficients in b-value scale, which you could us as (biologically interpretable) measures of change 2) because you use other covariates in the models (e.g. gender, age, smoking...), these stress coefficients will be adjusted for these variables and will reflect better what you are really testing in your models. This is because when you do the models you are not actually looking at changes in "raw" b-values, or M-values, in a sense. In these models the effects (the coefficients) of your variable of interest are "adjusted" for the other covariates. If you only looked at the "raw" b-values it would not be exactly the same that you were actually testing within limma. ADD REPLY • link 3.0 years ago by Papyrus ★ 2.9k Entering edit mode That makes very good sense, and very understandable.I would prefer to use m-values too. I also find it easier to work with limma. I just feel it needs to be more 'methylation-friendly' (as in more guides and more examples). Limma is more expression array-based. Thank you so much ! ADD REPLY • link 3.0 years ago by ritz ▴ 10 Entering edit mode Hello, I'd like to come back to your post because I find myself in the same situation and am therefore particularly interested in your suggestions. I was wondering when you suggested applying lmfit to betavalues to take into account any associated covariates for better biological interpretation : how would you suggest interpreting logFC in this case? For example, if I want a methylation difference of at least 20% between my conditions, what is the threshold for logFC in this case? For my results, I used the mvalues to select differentially methylated CpGs and I now want to use the beta values to filter this list of CpGs to select only those with a methylation difference of at least 20%. With this in mind, I'm thinking of using the beta values as you suggested but would welcome your suggestions to make sure I'm interpreting the logFC provided by the limma package correctly. Many thanks, ADD REPLY • link 1 day ago by hgui • 0 Entering edit mode I typically do testing on M-values, filter by FDR and then also assign a filter of minimum b-value/methylation difference (depending on the project). So I really don't use the logFC variable (which is in M-value units); rather, I create an additional variable of the mean difference in b-values between the groups that were compared and use it to filter. I typically do it on the raw methylation values but it is true that if you have many covariates, you may want to apply this on the "corrected" b-values (or check and compare). To get "corrected" b-values you can use the ADD REPLY • link 4 hours ago by Papyrus ★ 2.9k logFC
column actually gives the coefficients in this case, did you check in your fit$coefficients
object and compare? (Intercept) Stress Gender2 age
logFC AveExpr t P.Value adj.P.Val B
logFC
column actually gives the same (estimates/coefficients) as those estimates/coefficients contained in fit$coefficients
. To check, reorder the 2 objects (the $logFC
column from topTable
and the fit$coefficients
) to have the ordered CpG sites in the same objects and see that they are the same numbers: i.e. the $logFC
, if you extract the second coefficient, should be the same as the Stress
column in fit
)lmFit
in my own data for a continuous covariate (age) and checked that the logFC
contained the same as the age coefficient in fit$coefficients
).lmFit
) and use those for ranking or filtering CpGs by how much they change in beta value, if you want a more biological interpretation, while using the p-values from the tests done with the M-values.removeBatchEffect
function from limma
. But you can check and see if your results differ a lot by this or not.
Login before adding your answer.