Summer fellowships in data intensive science and cloud computing with the Open Science Data Cloud

Maria Patterson bio photo By Maria Patterson

The Open Science Data Cloud‘s (OSDC) summer research fellowship abroad is back for a fourth year, and the deadline for applications has been extended through April 30th on a rolling acceptance basis.  The fellowship is a fully-funded 6-8 week program for scientists to gain hands-on experience with big data and cloud computing and to conduct research in an international setting.  The program is primarily for graduate students, but upper level undergraduates, postdocs, and early career scientists may also be accepted.  The fellowship is sponsored by NSF through a Partnerships for International Research and Education (PIRE) grant.

Fellows are paired with mentors at an academic institution abroad and spend the summer working on projects focused on learning about and developing tools for making scientific big data analysis and storage more efficient.  I participated as a fellow last summer after finishing my Astronomy PhD and found it to be a fantastic learning experience.  

In graduate school for Astronomy or most domain-specific sciences, we spend the majority of our time focused on a narrow research question or field.  Many of the tools we use for our analysis are very specialized for our particular task.  As our data sets are growing ever larger, many research fields in different areas across domains are faced with the same problem: how can we make scientific discovery easier in the era of big data?  It’s a question that all scientists should have brewing in the back of their heads, but it rarely seems to come up in practice.  While I learned an incredible amount about the field of galaxy evolution in my graduate Astronomy program, I personally found that this forward thinking broader context was lacking from my graduate experience, and the OSDC summer fellowship was the perfect complement.  

Last year, the beginning of the program involved a one week workshop on big data and cloud computing in which all of the fellows, mentors, and organizers gathered in Edinburgh, Scotland at the University of Edinburgh School of Informatics before dispersing to their respective international institutions for the remainder of the summer.   During the workshop, I met with and heard presentations by a variety of researchers- computer scientists, engineers, software developers, geologists and earth scientists, biologists and bioinformaticians, physicists, and mathematical modelers- from the US, the UK, Brazil, China, Japan, and the Netherlands.  I had never really been to a scientific conference with such a diverse group of participants.  It was a great way to connect with and learn from people who might have a different approach to the same problem.

I learned about different types of databases and their pros and cons in the context of how they might be used.  I learned about workflows for data analysis.  I learned about how bioinformaticians are sharing, managing, and analyzing large amounts of genomic data.  I learned about Hadoop-based methods for analyzing large amounts of Earth satellite imagery.  I logged onto a ‘cloud’ for the first time.  I learned about virtual machines and used them for the first time.  I would later realize how much time and effort would have been saved had my collaborators from graduate school and I put our shared data on a cloud and used virtual machines pre-configured by the one guy who could actually figure out how to properly install the very specific software we needed.  I now use the cloud every day and launch a virtual machine with different software and different resources depending on the project I am working on.  I can’t believe I never did this before.  

The OSDC summer fellowship completely changed my approach to scientific data analysis.  The OSDC itself is just starting to take off, with nearly 1 PB of scientific data from a variety of fields and compute and storage resources available for scientific researchers.  If you are interested, find out more about the OSDC here: opensciencedatacloud.org and the OSDC summer PIRE fellowship here: pire.opensciencedatacloud.org.