Data Science

The data science field is relatively new. As such, the title "Data Scientist" can have many different definitions depending on with whom you are speaking. In this section I aimed to accomplish two things. First, I wanted to provide brief context so less familiar readers will develop a generalized understanding of data science. Second, I wanted to showcase a few data science projects I have worked on outside of academics.

Data science can be described generally as the science of extracting knowledge from data. The goal of obtaining such knowlege is usually to enable strategic decisions in a variety of contexts. Decisions about future directions can be made within academics as well as in business (such as research opportunities or investment options, respectively). In both fields, the data must be collected, condensed, analyzed, and presented.

One horrible stigma surrounding data science is that its applications are only relevant for what is often termed as "real-world" data. This has led many firms to believe that candidates with primarily academic backgrounds have no experience. This is simply wrong, and has plagued many excellent applicants. First, using the term "real-world" to soley describe "business-minded" data implies that physical, chemical, biological, or ecological data does not exist in the real-world, which is just silly. Second, almost all of the techniques in quantitative biological sciences (mathematical modeling, fitting, critical thinking, etc.) are applicable to any data-oriented problem - they are merely malleable skills. Third, most of the confusion is simply a difference in terminology.

For example, in regression problems, what research scientists term "independent variables" (x), data scientists term "features." What data scientists term "predictions", research scientists term "dependent variables" (y). Obviously these simplifications can become more complicated, but at the end of the day both fields are using mathematical models and statistics to explain behavior. In biophysical science, fitted equations are used to understand detailed thermodynamics and kinetics which are linked to macromolecular processes such as drug and ligand binding, protein folding, aggregation, transcription factor DNA binding, and allostery - all within the entire independent variable space. In data science, available features are used to make predictions. For some great reviews on data science algorithms, make sure to check out Robin Thottungal's post.

The link menu on the left of this page contains a project showcase. I have completed all of these projects, but make sure to check back soon, as I will be continually updating the link content. If there is a particular project you are interested in learning more about before it is posted, feel free to contact me. Enjoy!