Title: | Regression Toward the Mean |
---|---|
Description: | In repeated measures studies with extreme large or small values it is common that the subjects measurements on average are closer to the mean of the basic population. Interpreting possible changes in the mean in such situations can lead to biased results since the values were not randomly selected, they come from truncated sampling. This method allows to estimate the range of means where treatment effects are likely to occur when regression toward the mean is present. Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.<doi:10.1186/1471-2288-8-52>. Acknowledgments: We would like to acknowledge "Lena Roth" and "Nico Steckhan" for the package's initial updates (Q3 2024) and continued supervision and guidance. Both have contributed to discussing and integrating these methods into the package, ensuring they are up-to-date and contextually relevant. |
Authors: | Daniela Recchia [aut, cre], Thomas Ostermann [ctb], Julian Stein [ctb] |
Maintainer: | Daniela Recchia <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2 |
Built: | 2025-02-15 05:41:30 UTC |
Source: | https://github.com/cran/regtomean |
This function calculates the correlation for the data and Cohen's d effect sizes, both based on pooled and on treatment standard deviations. It can optionally display the results in an HTML widget.
cordata(Before, After, within = TRUE, data = NULL)
cordata(Before, After, within = TRUE, data = NULL)
Before |
a numeric vector giving the data values for the first (before) measure. |
After |
a numeric vector giving the data values for the second (after) measure. |
within |
A logical indicating whether the effect sizes should be computed based on paired samples ( |
data |
an optional data frame containing the variables in the formula. By |
This function computes the correlation between two measures and calculates Cohen's d effect sizes using both pooled and treatment standard deviations.
- If within = TRUE
, the effect sizes are computed assuming paired samples.
- If within = FALSE
, the effect sizes are computed assuming independent samples.
The results are returned as a data frame and also displayed in an HTML widget in the RStudio Viewer or default web browser.
Return a table containing the correlation, effect size pooled and effect size based on treatment.
Daniela Recchia, Thomas Ostermann.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press.
cordata("Before","After",data=language_test)
cordata("Before","After",data=language_test)
A dataset with scores from 8 students who failed a high school test and could not get their diploma. They repeated the exam and got new scores.
data("language_test")
data("language_test")
A data frame with 8 observations on the following 9 variables.
Student
a numeric vector
Before
a numeric vector
After
a numeric vector
a numeric vector
Cross
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Daniela Recchia, Thomas Ostermann.
McClave, J.T; Dietrich, F.H.:"Statistics";New York, Dellen Publishing; 1988.
data(language_test) ## maybe str(language_test) ; plot(language_test) ...
data(language_test) ## maybe str(language_test) ; plot(language_test) ...
This function calculates and plots treatment and regression effects of both before and after measures as also its p-values.
meechua_eff.CI(x,n,se_after)
meechua_eff.CI(x,n,se_after)
x |
a data frame containing the results from |
n |
the original sample size (number of observations) from data. |
se_after |
the estimated standard error from |
After performing the meechua_reg
the model coefficients mod_coef
as also its global variable se_after
are used as input in this function to estimate treatment and regression effects.
Two plots are performed, the first "Treatment Effect and p-value" and the second "Confidence Intervals" for mu
.
Daniela Recchia, Thomas Ostermann
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
# Initialize environment explicitly #regtomean_env <- new.env(parent = emptyenv()) # Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data mee_chua <- replicate_data(0, 100, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Perform regression analysis and store results results <- meechua_reg(mee_chua) mod_coef <- results$mod_coef se_after <- results$se_after # Call meechua_eff.CI meechua_eff.CI(mod_coef, 100, se_after)
# Initialize environment explicitly #regtomean_env <- new.env(parent = emptyenv()) # Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data mee_chua <- replicate_data(0, 100, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Perform regression analysis and store results results <- meechua_reg(mee_chua) mod_coef <- results$mod_coef se_after <- results$se_after # Call meechua_eff.CI meechua_eff.CI(mod_coef, 100, se_after)
meechua_reg
This functions plots all 4 diagnostics plots for each linear regression model: "Residuals vs Fitted", "Normal Q-Q", "Scale-Location" and "Residuals vs Leverage".
meechua_plot(models = NULL, env = regtomean_env)
meechua_plot(models = NULL, env = regtomean_env)
models |
A list containing the estimated linear models, typically the output of |
env |
An environment where the models are stored. The default is |
For each model from models
4 diagnostic plots are performed. For the first model the numbers 1 to 4 should be given, for the second model numbers from to 8 to 12, and so on.
Diagnostics plots for the set of models from meechua_reg
.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data mee_chua <- replicate_data(50, 60, "Before", "After", data = language_test) mee_chua_sort <- mee_chua[order(mee_chua$mu), ] # Perform regression analysis results <- meechua_reg(mee_chua_sort) # Plot models meechua_plot(results$models)
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data mee_chua <- replicate_data(50, 60, "Before", "After", data = language_test) mee_chua_sort <- mee_chua[order(mee_chua$mu), ] # Perform regression analysis results <- meechua_reg(mee_chua_sort) # Plot models meechua_plot(results$models)
This function fit linear models for a subset of data frames.
meechua_reg(x)
meechua_reg(x)
x |
Data to be used in the regression. |
The data used for the regression must be sorted by mu
.
A set of linear models
will be estimated and model coefficients are saved and stored in mod_coef
.
The estimated standard errror for the after
measure is also stored in se_after
to be used further in other functions.
A table containing the estimations for each mu
.
The variables models
, mod_coef
, se_after
are stored globally for further analysis if to_global
is set to TRUE. In any case the values will be returned.
The models are saved in an object called mee_chua
, which is not automatically printed but is saved in the environment.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N. & Luedtke, Rainer. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data replicate_data <- function(start, end, Before, After, data) { mu <- seq(start * 100, end * 100, by = (end - start)) mu <- rep(mu, each = nrow(data)) before <- data[[Before]] - mu / 100 after <- data[[After]] mee_chua <- data.frame(mu = mu, before = before, after = after) return(mee_chua) } mee_chua <- replicate_data(0, 1, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Sortieren nach 'mu' # Regression ausführen und Ergebnisse erhalten reg_results <- meechua_reg(mee_chua) # Zugriff auf Ergebnisse mod_coef <- reg_results$mod_coef se_after <- reg_results$se_after # Anzeigen der Ergebnisse print(mod_coef) print(se_after)
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) # Replicate data replicate_data <- function(start, end, Before, After, data) { mu <- seq(start * 100, end * 100, by = (end - start)) mu <- rep(mu, each = nrow(data)) before <- data[[Before]] - mu / 100 after <- data[[After]] mee_chua <- data.frame(mu = mu, before = before, after = after) return(mee_chua) } mee_chua <- replicate_data(0, 1, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Sortieren nach 'mu' # Regression ausführen und Ergebnisse erhalten reg_results <- meechua_reg(mee_chua) # Zugriff auf Ergebnisse mod_coef <- reg_results$mod_coef se_after <- reg_results$se_after # Anzeigen der Ergebnisse print(mod_coef) print(se_after)
Based on the data before and after the intervention and the regression models from the function meechua_reg
, this function plots the t-statistics and p-values for a given range of to assess whether the intervention has a significant impact on the measurements, accounting for regression to the mean.
plot_mu(x, n, se_after, lower = F, alpha = 0.05)
plot_mu(x, n, se_after, lower = F, alpha = 0.05)
x |
A data frame containing the results from |
n |
The original sample size (number of observations) of the data. |
se_after |
The estimated standard error from |
lower |
A boolean value specifying the direction of the one-sided tests. For |
alpha |
Specifies the significance threshold for the p-values of the corresponding one-sided tests. The default is |
A list containing the most significant , t-statistic, p-value, and the range of
for which the treatment impact is significant.
Julian Stein
Ostermann, T., Willich, S. N., & Luedtke, R. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) mee_chua <- replicate_data(0, 1, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Sortieren nach 'mu' # Regression ausführen und Ergebnisse erhalten reg_results <- meechua_reg(mee_chua) # Zugriff auf Ergebnisse mod_coef <- reg_results$mod_coef se_after <- reg_results$se_after # mod_coef and se_after are stored in the environment. plot_mu(mod_coef, 8, se_after)
# Generate example data language_test <- data.frame( Before = rnorm(100, mean = 50, sd = 10), After = rnorm(100, mean = 55, sd = 10) ) mee_chua <- replicate_data(0, 1, "Before", "After", data = language_test) mee_chua <- mee_chua[order(mee_chua$mu), ] # Sortieren nach 'mu' # Regression ausführen und Ergebnisse erhalten reg_results <- meechua_reg(mee_chua) # Zugriff auf Ergebnisse mod_coef <- reg_results$mod_coef se_after <- reg_results$se_after # mod_coef and se_after are stored in the environment. plot_mu(mod_coef, 8, se_after)
This function plots the t-statistics and p-values for a range of values, based on the provided data and regression models. It helps visualize whether the intervention has a significant impact on the measurements, accounting for regression to the mean.
plot_t( mu_start, mu_end, n, y1_mean, y2_mean, y1_std, y2_std, cov, lower = F, alpha = 0.05, r_insteadof_cov = F )
plot_t( mu_start, mu_end, n, y1_mean, y2_mean, y1_std, y2_std, cov, lower = F, alpha = 0.05, r_insteadof_cov = F )
mu_start |
Numeric. The starting value of |
mu_end |
Numeric. The ending value of |
n |
Numeric. The original sample size (number of observations) of the data. |
y1_mean |
Numeric. The mean of the first measurement. |
y2_mean |
Numeric. The mean of the second measurement. |
y1_std |
Numeric. The standard deviation of the first measurement. |
y2_std |
Numeric. The standard deviation of the second measurement. |
cov |
Numeric. The covariance between the two measurements, or if |
lower |
Logical. If |
alpha |
Numeric. The significance threshold for the p-values of the one-sided tests. The default is |
r_insteadof_cov |
Logical. If |
A ggplot2
plot with two y-axes: one showing p-values and the other showing t-statistics. The function also prints key values including the most significant , the minimal p-value, and the range of
where the treatment effect is significant.
Julian Stein
Ostermann, T., Willich, S. N., & Luedtke, R. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
# Example usage of the plot_t function plot_t( mu_start = 0, mu_end = 10, n = 50, y1_mean = 5, y2_mean = 5, y1_std = 2, y2_std = 2, cov = 0.5 ) plot_t( mu_start = 0, mu_end = 10, n = 50, y1_mean = 5, y2_mean = 5, y1_std = 2, y2_std = 2, cov = 0.5, lower = TRUE, alpha = 0.1 )
# Example usage of the plot_t function plot_t( mu_start = 0, mu_end = 10, n = 50, y1_mean = 5, y2_mean = 5, y1_std = 2, y2_std = 2, cov = 0.5 ) plot_t( mu_start = 0, mu_end = 10, n = 50, y1_mean = 5, y2_mean = 5, y1_std = 2, y2_std = 2, cov = 0.5, lower = TRUE, alpha = 0.1 )
This function replicates 100 times the "before" and "after" values, given a start and end reference for the population mean (mu
).
replicate_data(start, end, Before, After, data)
replicate_data(start, end, Before, After, data)
start |
A numeric value specifying the start value for |
end |
A numeric value specifying the end value for |
Before |
A numeric vector giving the data values for the first ("before") measurement. |
After |
A numeric vector giving the data values for the second ("after") measurement. |
data |
An optional data frame containing the |
To overcome the limitations of Mee and Chua's test regarding the population mean (mu
),
this function performs a replication of the data over a specified range of values.
The replicated data is used for systematically estimating the unknown population mean (mu
).
Further analyses are based on this new dataset.
A data frame containing the replicated dataset, which includes the columns mu
, before
, and after
.
Daniela Recchia, Thomas Ostermann.
Ostermann, T., Willich, Stefan N., & Luedtke, Rainer. (2008). Regression toward the mean - a detection method for unknown population mean based on Mee and Chua's algorithm. BMC Medical Research Methodology.
Galton, F. (1886). Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute (15: 246-263).
# Example usage of replicate_data replicate_data(0, 100, "Before", "After", data = language_test)
# Example usage of replicate_data replicate_data(0, 100, "Before", "After", data = language_test)