This course is offered through Coursera — you can add it to your Accredible profile to organize your learning, find others learning the same thing and to showcase evidence of your learning on your CV with Accredible's export features.
Course Date: 01 September 2014 to 29 September 2014 (4 weeks)
Learn how to use regression models, the most important statistical analysis tool in the data scientist's toolkit. This is the seventh course in the Johns Hopkins Data Science Specialization.
Jeff Leek is an Assistant Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and co-editor of the Simply Statistics Blog. He received his Ph.D. in Biostatistics from the University of Washington and is recognized for his contributions to genomic data analysis and statistical methods for personalized medicine. His data analyses have helped us understand the molecular mechanisms behind brain development, stem cell self-renewal, and the immune response to major blunt force trauma. His work has appeared in the top scientific and medical journals Nature, Proceedings of the National Academy of Sciences, Genome Biology, and PLoS Medicine. He created Data Analysis as a component of the year-long statistical methods core sequence for Biostatistics students at Johns Hopkins. The course has won a teaching excellence award, voted on by the students at Johns Hopkins, every year Dr. Leek has taught the course.
Roger D. Peng is an Associate Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health and a Co-Editor of the Simply Statistics blog. He received his Ph.D. in Statistics from the University of California, Los Angeles and is a prominent researcher in the areas of air pollution and health risk assessment and statistical methods for environmental data. He created the course Statistical Programming at Johns Hopkins as a way to introduce students to the computational tools for data analysis. Dr. Peng is also a national leader in the area of methods and standards for reproducible research and is the Reproducible Research editor for the journal Biostatistics. His research is highly interdisciplinary and his work has been published in major substantive and statistical journals, including the Journal of the American Medical Association and the Journal of the Royal Statistical Society. Dr. Peng is the author of more than a dozen software packages implementing statistical methods for environmental studies, methods for reproducible research, and data distribution tools. He has also given workshops, tutorials, and short courses in statistical computing and data analysis.
Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist’s toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA will be covered as well. Analysis of residuals and variability will be investigated. The course will cover modern thinking on model selection and novel uses of regression models including scatterplot smoothing.
Will I get a Statement of Accomplishment after completing this class? Yes. Students who successfully complete the class will receive a Statement of Accomplishment signed by the instructor.
What resources will I need for this class? Students must have the latest version of R and RStudio installed. How does this course fit into the Data Science Specialization? This is the seventh course in the sequence. Although it isn't a requirement, we recommend that you first take The Data Scientist's Toolbox and R Programming.
In this course students will learn how to fit regression models, how to interpret coefficients, how to investigate residuals and variability. Students will further learn special cases of regression models including use of dummy variables and multivariable adjustment. Extensions to generalized linear models, especially considering Poisson and logistic regression will be reviewed.
Weekly lecture videos and quizzes and a final peer-assessed project.