Logfold change (Limma Package) in methylation association studies (2024)

Logfold change (Limma Package) in methylation association studies

Entering edit mode

3.0 years ago

ritz ▴ 10

Hi all,

I need some advice. I am doing an analysis to look for an association between methylation and stress. I use lmfit model. My stress variable is continuous and I am not comparing any stress or control groups.

An example of my code:- design <- model.matrix(~ stress + age + gender + smoking + Plate , pheno)

sv <- sva(getM(eset.nosex), design, vfilter = 5e4) ##surrogate varaiblesdesign <- cbind(design, sv$sv) fit <- lmFit(getM(eset.nosex), design)fit <- eBayes(fit)

names(fit)

topTable(fit, 2)

How would one interpret log fold change in terms of methylation (especially when there is no group comparison and only an association study)? Arent estimates(coefficients) a better measurement for methylation in association studies?

thankyou!

limma logfoldchange • 1.8k views

ADD COMMENTlink updated 4 hours ago by Papyrus &starf; 2.9k • written 3.0 years ago by ritz &utrif; 10

Entering edit mode

I think the logFC column actually gives the coefficients in this case, did you check in your fit$coefficients object and compare?

Nonetheless, even without getting to the logFCs, because you're fitting the models in M-values (which is the usually recommended method), even the coefficient values will be difficult to interpret biologically. It may be better to interpret/filter the CpGs using the beta values (or even covariate-adjusted beta values).

ADD COMMENTlink 3.0 years ago by Papyrus &starf; 2.9k

Entering edit mode

Thank you for your reply.

Do you mean the coefficients or the estimates?

Yes, I did check fit$coefficients. It gives coefficients for each variable in the design matrix.

head(fit$coefficients)

 (Intercept) Stress Gender2 age

cg14817997 1.3965045 0.02149242 0.052808421 0.02321043cg26928153 2.3554372 -0.02522342 -0.036111524 0.04056885cg16269199 0.0455470 0.02982565 -0.114791997 0.06911121cg13869341 0.2278520 0.05470675 -0.187123556 0.07042281cg14008030 0.2342564 0.07319959 0.001113212 0.02870625cg12045430 -3.7704926 -0.06934915 0.168677371 -0.03177391 Smoking Plate2cg14817997 -0.16372064 0.6785156 cg26928153 -0.04224105 -0.5071137 cg16269199 -0.05919517 -0.8600021 cg13869341 -0.32086356 0.6786138 cg14008030 -0.23856418 2.3189791 cg12045430 0.23110099 -1.7975213

But what I would like to know is just for the variable of interest:- stress

topTable(fit2, 2)

 logFC AveExpr t P.Value adj.P.Val B

cg13547817 -0.3738731 3.5772747 -5.624865 1.955557e-07 0.1581173 5.915087cg16692227 -0.3559524 0.9775661 -5.455437 4.043169e-07 0.1634560 5.334972cg03287633 -0.1867909 3.0880473 -5.290847 8.107974e-07 0.1738228 4.778687cg17474383 -0.2511642 -0.0911020 -5.226623 1.060811e-06 0.1738228 4.563692cg21858516 -0.2169699 -0.7935851 -5.223461 1.074899e-06 0.1738228 4.553137cg16294325 -0.2948937 3.3952271 -5.110209 1.719679e-06 0.2317423 4.177111

Usually, when you run a normal lm model (without lmfit) we get estimates, std. error, t value, and pvalue. Which is pretty easy to interpret.

Yes, true beta values are better for biological interpretation, but after running the model in lmfit (using m values) I can convert to beta values and that should work right?

meth_beta = topTable(fit2, 2 , sort.by="p", number = nrow(getM(eset.nosex)))

thanks again.

ADD REPLYlink 3.0 years ago by ritz &utrif; 10

Entering edit mode

Yes, I meant that, if you fit a continuous covariate (e.g. stress), the logFC column actually gives the same (estimates/coefficients) as those estimates/coefficients contained in fit$coefficients. To check, reorder the 2 objects (the $logFC column from topTable and the fit$coefficients) to have the ordered CpG sites in the same objects and see that they are the same numbers: i.e. the $logFC, if you extract the second coefficient, should be the same as the Stress column in fit)

(I just did a lmFit in my own data for a continuous covariate (age) and checked that the logFC contained the same as the age coefficient in fit$coefficients).

Unless we're using different terminology, I was referring to the coefficients/estimates as the same thing (they represent the mean of change in Y for unit of change in X, thus being a measure of effect size; if you had factors, they would be the means or changes in means between groups).

Regarding going "back" from M-values to b-values, I don't think you can convert coefficients in M-value scale to coefficients in b-value scale, because the M- and b-value relationship is non-linear. For example, a change from 0 to 2 in M-value can be a 30% change in beta, while from 2 to 4, the beta change is much less (see Fig. 1 here).

I would rather directly use the original beta values. Maybe you could also estimate the coefficients using beta values (with lmFit) and use those for ranking or filtering CpGs by how much they change in beta value, if you want a more biological interpretation, while using the p-values from the tests done with the M-values.

ADD REPLYlink 3.0 years ago by Papyrus &starf; 2.9k

Entering edit mode

Thank you!

Yes, I did check the $logfc measurements and fir$coefficients and they are the same. That helps a lot.

yes, we cannot change the m-value coefficients to coefficients on the b-value scale. I did read the paper and they too advise using m-values for the analysis and b-values for the biological interpretation. I do use the beta values for plotting. If we need b-value-based coefficients then that would mean running the analysis with b-values and that doesn't make sense. It like a loop.

But thanks a ton, the logfc was a great help.

ADD REPLYlink 3.0 years ago by ritz &utrif; 10

Entering edit mode

Yes, I would also do everything on M-values. The reason I suggested also fitting the analyses on b-values is because:

1) with lmFit you have a very fast way of obtaining stress coefficients in b-value scale, which you could us as (biologically interpretable) measures of change

2) because you use other covariates in the models (e.g. gender, age, smoking...), these stress coefficients will be adjusted for these variables and will reflect better what you are really testing in your models. This is because when you do the models you are not actually looking at changes in "raw" b-values, or M-values, in a sense. In these models the effects (the coefficients) of your variable of interest are "adjusted" for the other covariates. If you only looked at the "raw" b-values it would not be exactly the same that you were actually testing within limma.

ADD REPLYlink 3.0 years ago by Papyrus &starf; 2.9k

Entering edit mode

That makes very good sense, and very understandable.I would prefer to use m-values too. I also find it easier to work with limma. I just feel it needs to be more 'methylation-friendly' (as in more guides and more examples). Limma is more expression array-based.

Thank you so much !

ADD REPLYlink 3.0 years ago by ritz &utrif; 10

Entering edit mode

Hello, I'd like to come back to your post because I find myself in the same situation and am therefore particularly interested in your suggestions. I was wondering when you suggested applying lmfit to betavalues to take into account any associated covariates for better biological interpretation : how would you suggest interpreting logFC in this case? For example, if I want a methylation difference of at least 20% between my conditions, what is the threshold for logFC in this case?

For my results, I used the mvalues to select differentially methylated CpGs and I now want to use the beta values to filter this list of CpGs to select only those with a methylation difference of at least 20%. With this in mind, I'm thinking of using the beta values as you suggested but would welcome your suggestions to make sure I'm interpreting the logFC provided by the limma package correctly.

Many thanks,

ADD REPLYlink 1 day ago by hgui • 0

Entering edit mode

I typically do testing on M-values, filter by FDR and then also assign a filter of minimum b-value/methylation difference (depending on the project). So I really don't use the logFC variable (which is in M-value units); rather, I create an additional variable of the mean difference in b-values between the groups that were compared and use it to filter.

I typically do it on the raw methylation values but it is true that if you have many covariates, you may want to apply this on the "corrected" b-values (or check and compare). To get "corrected" b-values you can use the removeBatchEffect function from limma. But you can check and see if your results differ a lot by this or not.

ADD REPLYlink 4 hours ago by Papyrus &starf; 2.9k

Login before adding your answer.

Logfold change (Limma Package) in methylation association studies (2024)
Top Articles
Latest Posts
Article information

Author: Margart Wisoky

Last Updated:

Views: 5621

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Margart Wisoky

Birthday: 1993-05-13

Address: 2113 Abernathy Knoll, New Tamerafurt, CT 66893-2169

Phone: +25815234346805

Job: Central Developer

Hobby: Machining, Pottery, Rafting, Cosplaying, Jogging, Taekwondo, Scouting

Introduction: My name is Margart Wisoky, I am a gorgeous, shiny, successful, beautiful, adventurous, excited, pleasant person who loves writing and wants to share my knowledge and understanding with you.