Let’s talk about CS50 — the famed computer science (CS) course taught by David J. Malan. (David hereafter.)
This post may read like an advertising, but it is most definitely not — the course is so well-known at this point that it promotes itself without additional marketing. Setting all of the praises aside, my goal here is to answer these two questions:
Why is it difficult to learn programming?
In spite of this difficulty, what makes CS50 work so well?
Programming is not an easy subject to learn, but it’s even harder to teach — there are too many fundamentals to cover in order to make sense of the inner workings of the computer, as they dictate how programs are constructed to operate based on these principles. Skipping ahead into programming would be akin to playing piano without any understanding of music theory: You can force yourself to memorize the full sequence of piano keys to press and eventually be able to play a song after enough repetitions, but you’d be mastering the song rather than learning the instrument itself.
The opposite is to just dive straight into a particular programming language and work on one project after another, which seems to be the approach that a vast majority of commercial coding courses take nowadays. But just as I have described above, the downside is that students will miss out on a lot of the contextual knowledge and they will have to fill in those gaps sooner than later (or even pay a price for causing real damage down the road).
The strength of CS50 is that it manages to find a great balance between the two and present them in a cohesive fashion. The knowledge is still challenging to acquire and there are plenty of difficult assignments for students to groan over — even it’s labeled as an introductory level level, it isn’t easy by any means — but it doesn’t lean too much into either theory nor practice.
As a popular college course of a fast-moving field, it does mean that its materials have to be annually updated to accommodate for a growing demand and to keep up with new technology trends. For our purpose here, though, we’ll just focus on the current version of CS50x (CS50 through edX) and avoid dawdling into its history — I don’t have enough firsthand information to be able to narrate that faithfully anyway.
Learning How to (Kind of) Program
Over the years, I have made several attempts to learn how to program — from my first “CS 101” college class a decade ago, to the random tutorials and books following then, it may surprise you that I made little headway in developing the skill. (There’s an also slim chance that I may just be a dimwit, but hopefully not.) To be clear, it’s not that I didn’t learn anything; I can code some and can read codes authored by others, but I still can’t create any sizable program unless I follow a detailed guide of some sort.
To use a cooking analogy: I can serve you a plain fried egg, but I’d be stumped if you were to ask me to prepare anything else without handing me a recipe.
At some point during this back-and-forth, it dawned on me that the vast majority of programming education and learning resources out there are, simply put, inadequate. This also includes many college courses. Speaking from experience: I had taught myself more about web development during the first year at my first job than I have ever acquired in my three programming courses in school. I suspect that a not insignificant number of people who have struggled in school are also victims on the receiving end of poor instructions (and this doesn’t limit to just programming).
(David’s success as a teaching professor is an exceptionally unique case. Most university professors are not incentivized to improve their teaching quality because their career success hinges on their research output, often more so than student satisfaction — such is the “publish or perish” path of thorns to tenure. In fact, students attending elite universities often receive a less satisfactory education than their peers in small colleges, due to professors delegating the teaching duty to their graduate students, in order to focus on their research.
I was slightly luckier than others, as my first CS professor did care about our learning outcome. Unfortunately, he was not very good at imparting knowledge to beginners. This is a man who had spent his entire career on conquering advanced math problems, and his inability to explain difficult concepts in intuitive terms just made him a poor fit to teach an introductory class. His heart was in the right place, but his skill was not.)
That is, until I decided to start CS50x, and that’s when I finally realized what I had been missing out on this entire time. I can only blame myself for letting it be buried in my bookmarks throughout all those years.
As of this writing, I am about halfway through the course, and suffice it is to say that it has been an arduous but rewarding experience (the pointer sections still haunt me). Despite having accumulated a fair bit of knowledge in the past, the lack of a proper foundation proves to be the biggest hindrance to my progress at the moment.
Satisfying of a Massive and Diverse Audience
David once reported that around two-thirds of the on-campus CS50 enrollees were entirely new to computer science. In spite of its reputation as a challenging and time-consuming commitment, hundreds of students still flocked to it.
And naturally, when it was put up on edX as an open course, countless people around the world also jumped onboard. In fact, at over 3 million enrollments, the CS50x adaptation of the original course is the most popular offering on the platform.
That said, I’m confident that a fair number of online enrollees are hoping to use this as a springboard into an entry-level software development job — they might just be disappointed to find out it won’t be the case. There’s plenty of useful information to learn here, but this knowledge isn’t going to prepare one to become employable in a short amount of time.
Rather, the academic focus of CS50 may only hinder their objective of “getting a coding job in three months”. While the course is an excellent gateway into the world of CS, its programming portion only cover introductory level contents, and may not suffice the requirements for many jobs in the industry.
(Side note: Coding education is a lucrative business and is seeing continuous growth thanks to the prosperity of the tech industry. Cue the proverb of “selling shovels during a gold rush” and all that.
I have a lot of reservations about commercial coding bootcamps — or any type of technical bootcamp in general. With just a dozen weeks of accelerated education and no prior experience of any sort, it’s unrealistic to expect beginners to take on responsibilities that require a comprehensive training. But there’s no shortage of unscrupulous or clueless employers who are willing to hire new coders to sweat over broken systems, as there’s equally an endless supplies of new entrants desperate for any semblance of work experience.)
Who is this course for, then? When David took over the cape of CS50, he stated his goal as such:
... (before dot-com bubble burst) some of that era’s students might have enrolled for academically wrong reasons (e.g., dreams of getting rich quick). So we did not necessarily want all of them back. *But among those 386 were some good computer scientists, some of whose potential might not have been realized had they not been coaxed from other departments. We wanted to find precisely those students in 2007.
For Harvard students who want to pursue a CS degree, this course will certainly help them put them on the right path to get started.
While the same may not apply to enrollees of the open course, there are also not many comparable (free!) alternatives out there if you want a firm working knowledge of core CS principles. Whether or not your main objective is to become well-versed in programming, if you want to learn CS the right way, and don’t mind a worthy challenge, you could certainly do much worse than CS50x.
There are also extra versions that are listed under the CS50 brand, targeting additional demographics and needs:
CS50’s Understanding Technology for complete beginners who are interested in a broad overview of today’s computing technology landscape.
CS50 “lite” versions for non-technical professionals looking to reinforce their business knowledge with a better grasp of modern technologies.
CS50 advanced courses for learners who’ve completed the original CS50 and want a deeper dive into a particular subject area.
The Curriculum Design
I wrote earlier that many commercial courses aim to gear students up with a given programming language (or a popular stack) as quickly as possible, in order help them become job-ready in a short time window — it’s a pragmatic idea, but a faulty one nonetheless, especially considering the complexity of today’s technology landscape.
While CS50 is still very much centered around programming, the curriculum is designed to teach the subject as a general concept and doesn’t rely any one language as a crutch.
To use an analogy: You aren’t just learning how to use a hammer to pound in a nail. You’re also understanding the rationale behind using a hammer to deliver an impact to a small area, and how to operate it without crumbling your phalanges.
As of the current iteration of CS50x, the curriculum is designed to roughly proceed in this sequence:
Basic CS concepts and introduction to Scratch visual programming language.
This establishes the rudimentary building blocks and rather than abstracting away with code, Scratch gives students a taste of how programming works through visualizing its constructs.
Progressively advanced concepts that build on top on each other, presented through the C programming language.
Moving from a visual programming in Scratch to a “real” language like C will be a steep transition for many, especially considering that the latter lacks the conveniences that other modern languages offer. However, starting early with this challenge allows students to assess how much they have internalized their newly acquired knowledge, and demystify some of the more abstract ideas that are foundational to computing — such as understanding the memory structure and how to manipulate it to achieve various tasks.
Pivot into Python and present the same concepts introduced in the early weeks through a new lens.
At the halfway mark, the curriculum switches to cover a certain set of tools every week, beginning with Python and its libraries. This unit can be a bit of an eye-opener for people who don’t have any prior programming experience, as Python is demonstrably more straightforward to use than C; much of the previous assignments can now be solved with a fraction of the original C code. Given the stark contrast, many students will surely develop a genuine appreciation for the progress of modern computing.
Processing and storing data, and using SQL (Structured Query Language) to perform complex data operations.
It’s only natural that the course gets to the topic of data operations at some point, as input and output (“put one thing in and get another thing out”) is a common aspect of programming. Continuing with Python as the main tool, this unit mixes in SQL to further expand the students’ tool box to learn how to work with data in and out of Python programs.
Wrapping up with web technologies and development tools.
The last two units are an overview of the core technologies that power the internet, including some of essential tools that are used to build web ecosystems — so called client-side vs server-side development. After finishing these, every student should be ready to start building complete web applications using the knowledge accumulated thus far.
…
All units across the ten-week curriculum have problem sets for students to apply their learning to test. For each of them, there’s a standard version that covers the new concepts introduced in the current week; a few of them also have an optional “more comfortable” problem with advanced specifications for students who are seeking an extra challenge.
(David had once stated that this format is intentional by design. If the required assignments were too difficult, they would only frustrate people who are still new to computing. By making the advanced problems optional, it scratches the itch of people who are more experienced in programming without impeding the rest.)
The course progression culminates towards a final project, which should put the students’ learning outcome to a ultimate test. After all, programming isn’t just for solving code problems — its full power is only revealed when it’s used for invention.
And, again, if it’s not obvious yet: CS50 is not an easy course by any means. The materials are well presented and there are plenty of community resources to help students answer their questions, but it nevertheless demands a great deal of time and dedication.
The Infrastructure and Communities
For a long time, CS50 was just a regular course offered on-campus. It was only after David took it over that he began to put the materials online. While this was done to help Harvard students review the lecture, people outside of the university began to find their way to those materials — that marked the origin of the massively successful edX course that we see today.
That said, unlike many other courses on that platform that depend on its functionalities, David and his team have long solved this problem by developing their own infrastructure from the beginning. With a fully configured coding environment in the cloud, students can bypass most of the hassle to setup and just focus on working through the assignments. The automated system for checking and grading problem sets also frees up the teaching team from having to fill in for the role of IT support, so they can focus on more important tasks.
In fact, students can work through the entire course without even using edX at all, because is everything available on the CS50x OpenCourseWare site.
On the surface, it is certainly not obvious how much work was done to make sure the system can support thousands of students seamlessly running their code in the cloud and get back automated responses in the snap of a finger. As the proverb goes, Rome wasn’t built in a day — the modern CS50 is a product culminated from almost fifteen years worth of labor.
Here’s one of the many examples: In order to make the management of code assignments easier and provide better end-user experiences for the students, the team moved their coding environments from the local computing clusters at Harvard to Amazon’s EC2 platform in late 2008. Incidentally, this also became an opportunity to teach students about cloud computing related concepts such as virtualization, scalability, so on and so forth.
(Considering Amazon’s platform had only exited beta right around that time, this likely meant that the CS50 team was one of the earliest adopters that used it for educational and teaching purposes. Most college instructors aren’t keen to switch to newest technologies for their core systems due to maintenance and stability concerns, so this serves as a clear proof of David’s commitment to ensuring student success.)
Its nature as an open course also brings forth significant advantages. Because there are so many students everywhere, people can find a community of their preference and have their questions answered one way or another — these continuous exchanges between the students create a rich and diverse knowledge base that would benefit any new comers. Anonymity is another added bonus here: As there are forums outside of edX that don’t require users to associate their course profile with the account, it gives less comfortable learners the room to ask “dumb” questions. (Somehow, this is one aspect that’s commonly overlooked in traditional course environments.)
The Success and Vision of CS50
While there’s no denying that the positive receptions of CS50 are the fruits of the team’s labor, we have to acknowledge that the growing demand of tech skills also greatly contributed to the course’s success. Just I wrote earlier: If we were to survey the students of CS50x, a majority of them will state that they hope to land a software development job eventually.
Right after the 2001 dotcom bubble of the US, enrollment in computer science programs saw a significant drop across the board. This slump continued on for a few years before it came back up, and then soon began to spike upwards. The resurgence coincided just enough with the economic recovery after the 2008 recession, which picked up even more steam as even the White House showed its support for CS education.
It goes without saying the course that we know today only exists because of David. He might not have set out to create a phenomenon of this scale, but this is where we have arrived — and it’s likely that there’s much more to come.
(Here’s a somewhat related fun fact: David sees so much success from his work in CS50 that he is one of the top earners in the Harvard school system, alongside of high ranking administrators and executive staff.)
From watching the lectures, you could tell that David has put in a lot of thought and care in his instructions — delivering the same materials for over fifteen years seems to have only honed his teaching skills. And what amuses me the most? Teaching CS to a bunch of complete beginners is supposed to be really difficult, but he managed to make it seem _easy_ and natural. As the receiver of many underwhelming (or even downright disastrous) technical courses in the past, this is not at all as effortless as it appears to be.
But David is not only looking to create a successful course. He has mentioned in several past instances that there is a larger vision of promoting technology literacy, especially for demographics that might have not been sufficiently exposed to this field. Indeed, we can deduce his commitment from a few observations:
Aside from the contents of the main course, the CS50 team goes the extra mile to host events for students to further their understanding of the broader discipline, whether it’s a movie screening about the AI technology used in gaming or seminars featuring industry practitioners.
During the Harvard Family Weekend of 2021, students welcomed their family and friends to attend a cybersecurity awareness talk that’s intended for the general public.
In recent years, David has made an effort to push the Harvard course towards a different grading system. Rather than receiving a letter grade, everyone will instead be judged on a pass-fail basis — so to encourage students who are wary of its steep learning curve to still register for it.
Musings on the Future of Programming
After that long-winded analysis of CS50 and programming education, it may be odd for me to ask this question now:
Why is still so difficult to visualize how a program will behave as you make changes to it?
But that’s the question that comes to mind whenever I’m struggling to work with a sizable project — there are just too many things to keep track of when you’re about to make changes that could impact multiple parts of the structure.
In times like those, I’m often reminded one of Bret Victor’s talks. This forty-five minute presentation is packed with eye-opening ideas that illustrate some of the fundamental barriers of contemporary programming practices. If you’ve ever worked with any larger software project before, you know how mentally demanding it is to unpack layers of abstraction buried in a complex codebase, with files and data that are scattered across different places. (Which is often made worse by people not doing a good enough job documenting their work.)
Rather than relying on inputting precise code, one could argue that more programming tools should function in a more visually intuitive interface, thus lowering the barrier of entry — although naysayers will claim that this would adversely impact power users instead. Regardless, the burden I described above remains a problem: Making the tool easier to pick up and use may not have any effect in improving the comprehension of abstract logic at all.
(It’s difficult to predict when will technology mature to the point of just allowing us to manipulate tangible objects to “program” our physical environment, while seeing the “outputs” in action at the same time.)
Speaking from experience, beginners in programming are presented with three layers of challenge:
There’s the concrete and rote mechanics of using a language, which is akin to being handed a big box of Lego blocks with thousands of varying forms. It’s not always easy to remember all the different shapes that are available and how they would fit together nicely...
And then there is the mental formulation of algorithms, which can be compared to the ability to come up with a step-by-step blueprint that informs how the Lego blocks should be placed together to construct a completed design...
Finally, there’s the ability to visualize and gauge the impact to the program flow as you manipulate different parts the code — can you predict and observe what will happen if you attempt to swap out a few dozen pieces of a massive Lego model?
In an ideal world, most if not all of our mental capacity should be dedicated to the formulation part — so we can focus on creating solutions and inventions — while our tools would shoulder the baggage of the other two activities on our behalf. But the reality is that the programmer is often juggling all three of them at once, while rapidly switching context back and forth. An experienced mind can do better at this balancing act, but the mental toll doesn’t go away.
I have to admit that this is a somewhat frustration conclusion to come down to. It’s unlikely that we’ll see any meaningful breakthrough in this regard anytime soon. Perhaps, the person that’s responsible for changing this future is enrolled in CS50x as we speak — or at least I’d like to imagine so, because it is certainly a good starting point for a budding computer scientist.
Of possible interest — aside from the links littered throughout this post, here are some of the CS50 related papers that I dug into to learn more about this topic:
Sharp, C., van Assema, J., Yu, B., Zidane, K., & Malan, D. J. (2020, June). An Open-Source, API-Based Framework for Assessing the Correctness of Code in CS50. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education (pp. 487-492).
Malan, D. J. (2010, March). Reinventing CS50. In Proceedings of the 41st ACM technical symposium on Computer science education (pp. 152-156).
Malan, D. J., Yu, B., & Lloyd, D. (2020, February). Teaching academic honesty in CS50. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (pp. 282-288).
Malan, D. J. (2010). Moving cs50 into the cloud.
Malan, D. J. (2021, March). Toward an Ungraded CS50. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (pp. 1076-1081).
Malan, D. J. (2013, March). CS50 sandbox: secure execution of untrusted code. In Proceeding of the 44th ACM technical symposium on Computer science education (pp. 141-146).
Forbes, J., Malan, D. J., Pon-Barry, H., Reges, S., & Sahami, M. (2017, March). Scaling introductory courses using undergraduate teaching assistants. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on computer science education (pp. 657-658).