In this section, we will explore statistical analysis in R, covering descriptive statistics, hypothesis testing, and regression analysis.
Descriptive statistics help you summarize and understand your data. Common descriptive statistics include measures of central tendency (mean, median, mode), measures of spread (variance, standard deviation, range), and graphical representations (histograms, box plots).
# Descriptive statistics example
data <- c(12, 15, 18, 22, 25, 28, 31)
# Mean and median
mean_data <- mean(data)
mean_data
## [1] 21.57143
median_data <- median(data)
median_data
## [1] 22
# Variance and standard deviation
variance_data <- var(data)
variance_data
## [1] 48.28571
sd_data <- sd(data)
sd_data
## [1] 6.948792
# Histogram
hist(data, main = "Histogram of Data", xlab = "Value", ylab = "Frequency")
Hypothesis testing is a crucial statistical technique to make inferences about populations based on sample data. In R, you can perform hypothesis tests for means, proportions, variances, and more. Here’s an example of a t-test for comparing two sample means:
# Hypothesis testing example (t-test)
group1 <- c(10, 12, 14, 16, 18)
group2 <- c(14, 15, 16, 17, 18)
t_test_result <- t.test(group1, group2)
t_test_result
##
## Welch Two Sample t-test
##
## data: group1 and group2
## t = -1.2649, df = 5.8824, p-value = 0.2537
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.887742 1.887742
## sample estimates:
## mean of x mean of y
## 14 16
Regression analysis helps you model and analyze relationships between variables. You can perform linear regression, logistic regression, and other types of regression analysis in R. Here’s a simple linear regression example:
# Linear regression example
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
linear_model <- lm(y ~ x)
summary(linear_model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## 1 2 3 4 5
## -0.6 0.6 0.8 -1.0 0.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.8000 0.9381 1.919 0.1508
## x 0.8000 0.2828 2.828 0.0663 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8944 on 3 degrees of freedom
## Multiple R-squared: 0.7273, Adjusted R-squared: 0.6364
## F-statistic: 8 on 1 and 3 DF, p-value: 0.06628
In this section, we’ve covered the essentials of statistical analysis in R, including descriptive statistics, hypothesis testing, and regression analysis. These statistical tools are invaluable for drawing insights and making data-driven decisions.
Feel free to explore more advanced statistical techniques and datasets to deepen your understanding of statistics with R.
Free Lessons: