Cloud Computing

Cloud computing offers an exciting alternative to conventional compute-platform paradigms in the field of scientific computing. CMI's cloud computing project (involving Chris Hill, Constantinos Evangelinos, John Marshall, Glenn Flierl and Lodovica Illari) is exploring ways in which MITgcm can be run as an internet-based application providing an alternative to running on a specific machine or system.

CMI's current thrusts in this area are towards:

  • Advancing the computer science associated with getting MITgcm to run on a generic cloud provider data center.
  • Developing and packaging an integrated suite of MITgcm-driven, cloud-based educational applications for use K-12 and beyond.

High Performance Computing and the Cloud.

While many scientific problems lend themselves to ensemble techniques involving large numbers of relatively small calculations run in parallel, most computing centers give priority to massively parallel simulations and are not geared towards handling many-task-computing application types: It is often easier to run a single 4096 processor job than 2048, 2 processor jobs.

In addition, shared super computing centers, and even more moderately sized clusters, tend to be geared towards batch computing (jobs must be queued, notice given and permission sought) and are therefore less well suited to handling large peak demand with the hard deadlines than can occur in, for example, a field experiment where data must be analyzed in a timely fashion (of order hours) to optimize operational decision making in real-time.

Constantinos Evangelinos (in collaboration with Pierre Lermusiaux, Jinshan Xu and Patrick Haley in the Mechanical Engineering Department at MIT), has been looking at an oceanic error subspace statistical estimation (ESSE) problem (an uncertainty prediction and data assimilation methodology employed for real-time ocean forecasts), involving a large number of ensemble calculations, as a vehicle for exploring the execution characteristics and challenges of a distributed workflow on a large dedicated cluster and the usability of enhancing this with runs on Amazon EC2 and the Teragrid with its concomitant I/O challenges.

read more...

Educational possibilities.

In removing the need for platform specific special access and expertise, this technology is also seen as opening the door to powerful new educational possibilities.  CITE (Cloud-computing Infrastructure and Technology for Education) – is an NSF STCI funded project aimed at supporting the development of middleware that will enable numerical models to be run on commercial compute farms (like EC2)  via cloud computing and which can be exploited in ongoing and future classroom educational activities.

The two linked goals of the CITE project are:

  • The development of technology that will be suitable for many educational scenarios, including providing access to parallel computing resources in classrooms (K-12 on to university). Students and teachers will be able to run and interact with numerical models developed by leading researchers without the overhead of supporting software distributed to desktops in a school or the logistical headache of maintaining a cluster resource. Commercial compute farms will be exploited in which the technical `nitty-gritty' is outsourced to these specialized providers.
  • Showcasing the technology in a `virtual fluid laboratory' that will be used in ongoing undergraduate and graduate courses being offered at MIT (eg in 12.804, 12.307) and collaborating universities.
 
CITE (Cloud-computing Infrastructure and Technology for Education) - A taste of things to come? Here one of the team's children demonstrates how this technology makes fluid modeling "child's play"...

Because the software technology that is being developed is very general, it can be applied to make computer models part of almost any course available in, for example, MIT's Open Course Ware initiative. It can release the power of compute clusters to anyone with a low cost laptop or `netbook' computer.


Read an MITgcm news story about work to develop earth science visualization tools intended for use in a cloud-computing environment.

 

 

Publications

Evangelinos, C., P.F.J. Lermusiaux, J. Xu, P.J. Haley, and C.N. Hill, 2010.
Many Task Computing for Real-Time Uncertainty Prediction and Data Assimilation
in the Ocean, IEEE Transactions on Parallel and Distributed Systems,
Special Issue on Many-Task Computing, I. Foster, I. Raicu and Y. Zhao (Guest
Eds.), Submitted.

Evangelinos, C., P.F.J. Lermusiaux, J. Xu, P.J.Haley, and C.N. Hill, 2009,
Many Task Computing for Multidisciplinary Ocean Sciences: Real-Time Uncertainty Prediction and Data Assimilation, 2nd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS'09) at SC09.

Evangelinos, C. and C.N. Hill, 2008,
Cloud computing for parallel scientific HPC application: feasibility of running coupled atmosphere-ocean climate model on Amazon's EC2, Cloud-Computing and Its Applications conference, CCA-08, extended abstract.