Course: Causal Diagrams: Draw Your Assumptions Before Your Conclusions
Length: 9 weeks, 2-3 hrs/wk
Instructor: Miguel Hernán
The first part of this course is comprised of five lessons that introduce the theory of causal diagrams and describe its applications to causal inference. The fifth lesson provides a simple graphical description of the bias of conventional statistical methods for confounding adjustment in the presence of time-varying covariates. The second part of the course presents a series of case studies that highlight the practical applications of causal diagrams to real-world questions from the health and social sciences.
I had absolutely no idea what to expect when I signed up for this course. The subtitle – “Draw your assumptions before your conclusions” – sounded something like one of those decision-making questionnaires from self-help books, but it was taught by a Harvard epidemiologist so that didn’t seem right. Something about graphic design? Project management? Yes, I knew it had something to do with statistics and data science. Yes, I’m allergic to statistics, which always turns into something awful like summing squares or coaxing spreadsheets to sum squares. I’ve thus far avoided data science, an even worse mess because it’s usually under the auspices of computer science people, and you know how they can be (yes, I’m kidding – back in the Days of the Mainframe, I was what in the business world passed for tech support, which meant we called IBM if a reboot didn’t fix the problem).
But the teaser video sounded interesting, and the medical foundation greatly appealed to me. I figured I’d give it a week. I ended up completing the course. Even got a passing grade – and a good passing grade, at that. But I’m not getting carried away: most of the graded questions allowed multiple attempts.
I found it to be an exceptionally well-done course: organized, clear, nicely delivered, and progressing from very basic concepts to more complicated material little by little. Keep in mind, I’m an absolute newbie to all of this; someone who’s done some work in data science, or has a wider view of how this fits into the whole subject of data science, might feel differently. More than anything else, this all reminded me of tracing logic trees in that UMelbourne Logic course I liked so much (which, sadly, never made the jump to Coursera’s new and “improved” – ahem – platform).
Little things meant a lot. Like large, clear, high-contrast graphics. Granted, the salient images were mostly just letters, numbers, and arrows, but I appreciated the legibility that hand-drawn diagrams on a board (or fancy but hard-to-read and harder-to-screenclip renditions) sometimes lack. The lectures were repetitive enough to build up some kind of vocabulary. The step-by-step approach was perfect for me; again, someone with a stronger background in the field might have found this a bit annoying, but that’s what fast-forward is for. I was also delighted to see an explanation for Simpson’s Paradox that actually made sense to me, an explanation that didn’t involve batting averages or student test scores but related to a research case; it tied together causation and weighted averaging for me in a way I hadn’t seen before. Interestingly, a couple of days after I encountered that lecture, MinutePhysics released a video about Simpson that so closely mirrored the lecture, I had to wonder if Henry Reich was enrolled in the mooc.
Each module began with a case study: the effect of estrogen on uterine cancer, folic acid and birth defects, etc. Somewhere in there was a problem, usually a contradiction between studies using different statistical methods, or a result that didn’t make sense (could cigarette smoking prevent dementia in older people? No, of course not, but what does it mean when the numbers say that?). This would lead into a discussion of the module topic – confounding, or selection bias, or measurement bias – and about 45 minutes of video, divided into short segments, to explain how the problem arose and how it could be fixed. A final recap of the case, showing how the module topic played into the real-life research and how causal diagrams resolved the problem, ended the week.
Graded material included short quizzes after most video segments, and a weekly quiz. The final exam was a series of four case studies (only two were required) discussed at length via interview with different investigators, and questions relating to the issues raised by those studies. This was great in a couple of ways. It’s always nice to see how someone else talks about a subject, since everyone uses slightly different language and sees different things as central. It also presented questions on new issues without the same degree of shepherding and hand-holding. I found the first one quite manageable, the second one a bit trickier, and the third one very difficult. At this writing, I haven’t looked at the fourth one yet.
If I may digress (and it’s my blog, who’s gonna stop me?), I created a kind of study guide on Cerego for this course. While it’s clearly best for pure fact memorization, I’m finding that just figuring out the key points and the best Cerego format for them is a form of studying; then the spaced-recall feature worked quite well to keep reminding me about d-separation rules and different structures as I moved through the weeks. I’m still new to creating my own sets and am pretty clumsy at it, but I was impressed with how well it worked here with incorporating – not just remembering – things like a conditioned collider opens a path but a conditioned non-collider blocks it.
To be honest, I was kind of Done by the time I got to the cases in Week 5. Remember, I’m a tourist in these parts, and while it was a very nice place to visit, I’m not sure I’d want to live there. And I have other things starting, so I needed to clear the boards. But I’m very glad I wandered in. I have no idea how the course would work for typical data science students, and I wouldn’t imagine anyone else would be particularly interested. But for me, always looking for a way in to the math I can’t seem to understand, it was another huge success.