Python or Ruby?
I teach high school math and science, and over the last couple years I have started to teach an Introduction to Programming class. The class is framed around general programming skills and concepts, but students learn to program in Python during the class. As students have started to develop a real interest in programming, I began to wonder if Python is the best language to be teaching them. I want to teach students a language that will serve them well, not just whatever language I happen to know best. To help think through the question of language choice, I decided to take a look at which language has a greater presence on Hacker News.
This investigation was not meant to prove that either language is better than the other. I know you can do just about anything you want in either language, especially relating to web development. I also know that whichever language is featured more on HN is not necessarily the best language to teach. This was just a simple experiment to see which language has a greater presence on HN. So the question I focused on was:
“Which language, Python or Ruby, has a greater presence on the front page of Hacker News?”
I also had a few other goals in mind with this project. I wanted to learn more about data visualization, and about professional coding conventions. I wanted a break from a couple larger projects I’ve been working on, and I wanted to finish the project in a short amount of time.
I wanted three to four weeks of data from the front page of HN, and thought I would have to write a scraper of some sort. But I found an api for HN data, which was straightforward to use. I wrote a short script to compile data from the front page of HN, and set up a cron job to grab this data once every hour.
The script originally ran on a server I had running at my school. But my school is being torn down for a remodel, and we had to move everything out of the building over the last few weeks. I didn’t want the server running at home, so I finally read a short tutorial and set up my first micro server on amazon. Now I should be able to reliably collect data on this project for a full year if I care to do so. It was very satisfying to poke around the aws interface, and I learned a lot from setting up the micro instance. This was one of the clear benefits of choosing a small project, and then following it wherever it led.
I wrote an embarrassingly long script to analyze the data that was collected. As a fairly new dad who programs on the side, I work at five in the morning when I can, and sometimes get a little programming in just before bed. So I was often tempted to “just add a few more functions here”, rather than breaking the script into more manageable classes. But it would be easy to clean up the code if there is ever a reason to do so.
To put a number on the concept of front page “presence”, I came up with a simple formula based on each article’s upvotes, number of comments, and rank on the front page:
presence = upvotes + comments + front_page_rank/30
I decided not to weight upvotes or comments. Dividing front page rank by 30 seems to represent the influence of an article’s rank on the front page accurately. The top article certainly stands out more than the rest, as do the next few articles. But as we scroll down the front page, each article’s individual position is less significant than the simple fact that the article has made it to the front page. Raw presence points are normalized for each hour and for each day.
I made three graphs from this data. The first graph is an hourly line graph of the front page presence of all Python articles combined, and all Ruby articles combined. The second graph is an interactive bar graph of all Python and all Ruby articles, with each article’s front page presence shown as a distinct segment of that day’s bar. The third graph includes all articles for each day, even if they have nothing to do with Python or Ruby.
Articles are considered Python-related if they include any of the words “python”, “django”, or “flask”. Titles that include the words “ruby” or “rails” are considered Ruby articles.
The hourly presence of Python and Ruby articles surprised me. I thought they would be about even, or maybe Ruby would be featured a bit more often. I was surprised to see how much more presence Python articles have than Ruby. Python articles appear on the front page more often, and when Python articles appear they receive more attention than Ruby articles do.
The graph of daily Python and Ruby articles is interesting in that it shows the relative contribution of each article to that language’s front page presence. The graph is interactive; clicking on any article’s bar segment brings you to that article’s discussion page on HN. It is also interesting to see the top articles from this time period. If you are a Python or Ruby dev, it might be worth looking at any of these articles that you missed.
The daily graph of all articles is also interesting. This graph shows us how prominent, or not prominent, Python and Ruby articles are in relation to all articles over this time period. Looking at the list of top articles, we can see some major events from this time period, such as SpaceX’s rendezvous with the ISS.
I am very happy to have worked on this project. It was satisfying to play with data visualization tools such as matplotlib and ReportLab. I also wrote rudimentary tests, asserting that norm points for each hour and each day, for all articles, add up to 1. Watching those tests fail once made me want to write tests for every significant program I write, for the rest of my life. It was very satisfying to have a bug caught automatically, and know I wouldn’t be scratching my head at some subtly incorrect behavior later on. This project also made me pay more attention to the kinds of Python and Ruby articles that appeared, which drew my attention to a number of topics I might not have paid attention to otherwise.
I have one specific thought on why Python has a stronger front page presence on HN than Ruby. While the Django and Rails frameworks both serve similar purposes, people seem to use Python for a larger set of domains than Ruby. Ruby articles tend to be all about Rails and web development. Python articles may be about web development, but might just as easily relate to scientific computing, data visualization, or a number of other domains.
As far as teaching goes, I will happily stay with Python for the time being. I like that Python is highly visible in both web development and a number of other domains. I aim to embed programming in math and science classes, and projects like matplotlib, NumPy, SciPy, and others give us plenty to work with. I love the Python community, and look forward to bringing students to PyCon next year. That said, I have nothing against Ruby. The language we focus on doesn’t matter too much as long as students are learning core programming concepts, and starting to use best practices in software development. Students who learn to program well in Python will pick up Ruby quite easily if they want to at a later time.
I have no pressing need to do much more with this project. I will probably generate these three graphs from time to time, just to see if anything interesting comes up. I will be curious to see if there is a spike around PyCon or RubyConf, and to see what the data looks like after six months to a year. Otherwise, I look forward to going back to my longer-term projects, and applying some of what I have learned from this experiment to these projects.
Update: I have updated this project with data from the last 6 months.Follow @ehmatthes