Recently there has been an explosion of interest in the application of informatics tools, particularly machine learning, to materials and other domains of science and engineering. A number of features make science and engineering domain specific machine learning applications exceptionally well-suited to undergraduate research. First, relative to many research problems, domain specific machine learning applications are often simple to understand and easy to explore with limited background. Second, research projects in this area develop skills of high-value for future employment or post-graduate education, including technical skills in data science, statistics, programming, and specific domains, as well as broader skills in project management, teamwork, and communication. Third, many problems require only a laptop and free software, or relatively inexpensive computing resources (e.g., a small amount of GPU time). Motivated by the above opportunities I initiated the Informatics Skunkworks1. The group has a goal of engaging undergraduates in research dedicated to realizing the potential of informatics for science and engineering, with a focus on materials problems. We have had over a 100 participants since 2015, now with typically over 30 per semester. The projects have had significant impact on students, shown most quantitatively by a strong list of conference presentations, published papers, student awards, and student placement in top graduate programs and companies. Three major challenges we face are (i) how to give students enough information to enable research but not so much that they cannot learn it quickly, (ii) how to allow students to make progress quickly without extensive programming or machine learning expertise, and (iii) how to provide high-quality mentoring given constraints on mentor experience and time. To overcome (i) we have developed a set of modules on key machine learning issues (e.g., machine vision or how to use specific codes)2 targeted at undergraduates with no background who need to quickly getting a practical working knowledge of the material. To overcome (ii) we have developed the MAterials Simulation Toolkit – Machine Learning (MAST-ML) package,3 along with useful practice datasets,4 which allows full machine learning project workflows to be executed from a simple input file with no programming skills and limited machine learning background. To overcome (iii) we are exploring increased program structure and student co-mentoring, but are still far from robust solutions. Our teaching modules and MAST-ML tools allow students to make progress in even just a few hours, supporting not just extended research projects but also class projects and laboratory exercises in this area. In this talk I will describe the mechanics of how we structure the skunkworks, some of the projects we have explored (and their successes and challenges), the resources we have developed to enable this work, and ongoing challenges and opportunities. In particular, I will also discuss our vision for the future and efforts to expand the skunkworks across multiple institutions, and I hope this talk will help start collaborations with others with shared interests to develop integrated efforts going forward.
1. https://skunkworks.engr.wisc.edu/; 2. https://bit.ly/2WyBZW9 ; 3. https://github.com/uw-cmg/MAST-ML ; 4. https://figshare.com/articles/MAST-ML_Education_Datasets/7017254