For as long as I can remember, I've been toying with numbers. As an undergraduate student in the late 1970s, I began taking statistics courses, learning ways to examine and analyze data to uncover some meaning.
Back then, I had a scientific calculator that made statistical calculations much easier than ever before. In the early '90s, as a graduate student in educational psychology working on t-tests, correlations, and ANOVA[1], I started doing my calculations by meticulously writing text files that were fed into an IBM mainframe. The mainframe was an improvement over my handheld calculator, but one minor spacing error rendered the whole process null and void, and the process was still somewhat tedious.
For writing papers and especially my thesis, I needed a way to create charts from my data and embed them in word processing documents. I was fascinated with Microsoft Excel and its number-crunching capabilities and the myriad charts I could create with the computed results. But there were costs at every step of the way. In the 1990s, along with Excel, there were other proprietary packages available like SAS and SPSS+, but the learning curve was a steep task for my already cramped graduate student schedule.Fast forward to the present
More recently, due to my budding interest in data science, combined with my keen interest in Linux and open source software, I've read a lot of data science articles and listened to a lot of data science speakers talk about their work at Linux conferences. As a result, I became very interested in the programming language R, an open source statistical computing software.
At first, it was just a spark. That spark grew when I talked to my friend Michael J. Gallagher, PhD, about how he used R in his