R LEARNING RESOURCES
Compiled by the Professional Programs and Accreditation Committee (PPAC)
Many FHS students have shared that they would benefit from additional support in learning R, especially when preparing for practicums, thesis work, or data-oriented roles after graduation. R has a learning curve, but there are excellent, accessible, and often free resources that can help you build confidence step-by-step.
Below, we’ve curated a set of high-quality learning tools, ranging from beginner tutorials to more advanced data analysis training. These resources were selected for clarity, relevance to public health analytics, and accessibility for learners at different levels.
Recommended R Learning Resources
1.
Level: Absolute Beginner
Cost: Mixed (free + paid)
Why it’s useful: Codecademy provides an interactive, hands-on environment, letting you write R code directly in your browser with immediate feedback. The modules cover basics such as data types, functions, and data frames, building toward intermediate skills.
Best for: Students who learn best by doing and prefer interactive exercises over videos or readings.
2.
Level: Beginner
Cost: Free
Why it’s useful: This is a curated list of the best introductory R tutorials, compiled by instructors at Princeton. It breaks down learning pathways clearly (e.g., intro R, data wrangling, visualization) and links to trusted, vetted modules. Great for students who want a structured self-study plan without enrolling in a full course.
Best for: Students who want a guided pathway with multiple options.
3.
Level: Beginner to Intermediate
Cost: Free
Why it’s useful: Developed by 911³Ô¹Ï’s own HEAL Lab, this guide introduces essential R skills in a public health–relevant context. Students learn how to load, clean, recode, and analyze survey data using tidyverse, with examples based on real health datasets. It is particularly helpful for practicum projects using Qualtrics, REDCap, or community survey data.
Best for: MPH/MSc students doing applied quantitative work; students preparing for practicum or thesis analytics.
4.
Level: Beginner to Intermediate
Cost: Free to audit; certificate optional
Why it’s useful: Part of the internationally recognized Johns Hopkins Data Science Specialization, this course provides foundational skills in R programming with high production value. Topics include control structures, functions, debugging, simulation, and basic data analysis. It includes quizzes, assignments, and community support.
Best for: Students who want a structured, instructor-led course with clear learning outcomes.
5.
Level: Beginner to Advanced
Cost: Mixed (free + paid certificates)
Why it’s useful: Harvard hosts several well-designed R courses, including Data Science: R Basics, Statistics and R, and more advanced modules. These courses emphasize conceptual understanding and statistical application.
Best for: Students wanting rigorous, academically oriented training aligned with public health and biostatistics.
6.
Level: Beginner to Advanced
Cost: Free
Why it’s useful: Statistical Horizons offers a curated list of high-quality R tutorials, cheat sheets, reference guides, and advanced resources. It’s a great hub for discovering new textbooks, packages, and guides, including materials on regression, causal inference, and multilevel modeling.
Best for: Students who already know some R and want to expand into more advanced topics or methods.
Tips for Success
- Start with one resource (don’t try to do them all).
- Practice on real datasets—your thesis, practicum, or publicly available data from StatsCan or WHO.
- Pair learning R with community support (TA hours, peers, research groups).
- Expect a learning curve—everyone struggles at first, even experienced analysts.
- For FHS-specific help, connect with your supervisor, TAs, or the HEAL Lab methods tutorials.
Using AI tools for R Coding
AI tools (such as ChatGPT) can be an excellent companion while learning R, but only if you know how to prompt it effectively. The guide below shows you how to use AI tools to write cleaner code, troubleshoot errors, and accelerate your learning while still understanding what you are doing.
NOTE: Generative AI software can be helpful in supporting coding activities. Before you begin using AI tools, we strongly recommend you familiarize yourself with any policies governing the use of these tools in your class, work/practicum placement, or with your community partners.
1. Start with a Clear Research Task
AI tools work best when it knows exactly what you’re trying to do.
Good prompts include:
- Your research question
- The dataset you are using
- The variables (including names & types)
- The analysis you intend to run
EXAMPLE PROMPT
I’m analyzing survey data. My research question is: Are women more likely than men to report moderate-to-high loneliness?
Variables:
- gender (factor: male, female, nonbinary)
- lonely_score (numeric 0–10 scale)
- lonely_cat (factor: low, moderate, high)
Please generate tidyverse code to calculate mean loneliness by gender and run a chi-square test on lonely_cat by gender. Also tell me if this is the correct test, or if an alternative analytic approach should be used.
2. Always Provide Variable Names, Types, and Sample Values
R code depends heavily on variable names and structures. Without them, AI has to guess and often guesses wrong.
EXAMPLE PROMPT
Here are the variables in my dataset:
- age (numeric)
- income (numeric, continuous)
- education (factor with HighSchool, College, University)
- health_status (factor, ordered: poor < fair < good < very_good < excellent)
Please write R code using tidyverse to:
1. Recode age into categories,
2. Summarize income by education, and
3. Plot health_status by education.
This ensures code is tailored to your dataset
3. Share Your Data Structure
Use str(df), glimpse(df), or a sample of your data.
EXAMPLE PROMPT
Here is glimpse(df) output. Use this structure to write code that…
Rows: 835
Columns: 6
$ id <int>
$ gender <chr>
$ age <dbl>
$ lonely_score <dbl>
$ lonely_cat <chr>
$ province <chr>
This helps your AI tool generate correct, runnable code.
4. Ask for Explanations, Not Just Code
You learn faster if your AI tool explains why something works.
GOOD EXAMPLE PROMPT
Write code to run a logistic regression predicting lonely_cat from age and gender.
Also explain what each line of code does, and how to interpret the coefficients.
5. Troubleshoot Errors With Copy/Paste
When R throws an error:
- Copy the error message
- Copy the code block you ran
- Tell your AI tool what you expected to happen
ERROR TROUBLESHOOTING PROMPT
I ran this code:
df %>% mutate(lonely_score = as.numeric(lonely_score))
But I got the error:
Error: Problem with mutate(): NAs introduced by coercion
Can you explain why this happened and how I can fix it? Here is a sample of the data in lonely_score.
This is the single most useful pattern for debugging.
Sample "Perfect Prompt" Template
My goal: [state your research question]
Dataset: [name in R (e.g., df or data) + brief description]
Variables: [include name + type + sample values]
What I need: [graph? model? Cleaning?; explain in plain language]
Context: [assignment? thesis? practicum?]
Important constraints: [tidyverse only? no loops? Etc.]
Example Data Structure: [str() output or similar.]
Please provide code, an explanation, and an example output.