In the Asia-Pacific region, Singapore may be smaller in scale compared to the resources of China, Japan or Korea when it comes to the scientific, research and engineering base, but it has had accumulated expertise and user base to justify having large computer systems, even up to the supercomputer grade, for a long time. A*CRC is a unique combination of diverse high-end hardware resources led by a team of experts to support the growing user base.
A Choice of Systems for Every User
In the Asia-Pacific region, Singapore may be smaller in scale compared to the resources of China, Japan or Korea when it comes to the scientific, research and engineering base, but it has had accumulated expertise and user base to justify having large computer systems, even up to the supercomputer grade, for a long time.
Top Singapore universities, NUS and NTU, as well as various institutes and even companies involved in rendering farms or oil exploration simulations have been using large computer here, either vector or parallel single image supercomputers, or clusters of standardised servers. As the usage increased, leading to many disparate systems in various locations, the Government agency A*STAR created A*STAR Computational Resource Centre (A*CRC) where, at least for the numerous institutes under the A*STAR umbrella as well as other research bodies in Singapore, there was now a centralised high end supercomputing resource base.
Located on the 17th floor of the humongous Fusionopolis behemoth of a building on top of the soon to open One-North Circle Line MRT station in Ayer Rajah area, A*CRC has one of world's rare heavy-duty supercomputer data centres located on a high floor of a tall building. It is divided into five functional areas: Computational Systems, Storage, Networking, Operations, and HPC Software.The two dozen staff, headed by Dr Marek Michalewicz, Director, and Stephen Wong, Deputy Director in charge of systems, among others, provide support for over seven hundred of local, and some international, users on a variety of systems. A*CRC operates another data center with large storage, located at the Biopolis, aimed mainly at bioinformatics users.
Now, A*CRC can't yet fight in size with multi-petaflop systems that can now be found in Japan or China, but the centre has diversity of hardware platforms that makes it interesting: from their 2048-core Westmere-EX single-image shared memory machine with 12 TB RAM, the largest of its type in the region, to a 3,800+ core Xeon DP Fujitsu general purpose cluster, AMD Opteron SMP cluster combination as well as one of the largest IBM Power7-based RISC clusters in the region.
The first one would obviously make hackers salivate – how about seeing a Linux (or Windows, as the machine could run it too) task prompt showing 2048 cores available with 12,000 GB system memory – and there's a good reason for it. Too long there was an overemphasis just on the TOP 500 supercomputer ranking in terms of peak and tested GFLOPs or TFLOPs according to the (in)famous Linpack linear algebra benchmark routine, however the underlying system architecture can influence the real application execution potential even more. Dr Marek's team has, over the past few years, consciously diversified the available resources for varied usage types so, whether your application can scale to 1,000 cores in a single task, or you just dump 1,000 simple unconnected threads at the same time, there is the right kind of system to run either kind of jobs – plus many other types – in the centre.
There are three types of computer systems right now at A*CRC:
- Cluster of large number of small, dual processor nodes for heavy loads of many serial jobs as well as highly parallel codes that run well with message passing (MPI); The Fujitsu BX900 450-node cluster of dual-XeonDP 'small' nodes, each with two CPU sockets and 24 GB or more RAM per node, all connected via generic Infiniband QDR, is the main resource here. Its 45 TFLOPs performance capability puts it close to the ~ 50 TFLOPs class where the world's Top 200 systems start right now.
- Cluster of smaller number of larger nodes, more suitable for large memory-footprint applications with higher degree of CPU and thread communication. There are 2 main systems here, one being a brand new IBM p755 Power7 cluster of 30 nodes, each with 32 cores running at 3.3 GHz and 128 GB RAM per node; the other one is an older 32-node cluster of 8-socket quad-core Opterons, made by HP, but its excellent PCIe I/O bandwidth as well as 128 GB RAM per node make it good for hosting multiple GPGPUs, for instance.
- SMP machine, aimed at very large tera-scale tasks in terms of both processor and memory load, like the above mentioned SGI Altix UV with its 256 eight-core Intel Xeons put together sharing 12 TB RAM, all under one Linux OS boot. Each of the cores runs at 2.66 GHz with peak 4 flops/cycle, offering theoretical peak performance of over 21 TFLOPs in a single machine.
What would seem to be good to add here to the resource, I feel, would be a nice visualisation cluster, driving a ~ 100 Mpixel 3-D CAVE-style VR facility. Such combination is normally driven by dozens of tightly interconnected dual-CPU workstations, each with a powerful Open GL GPU or two, which use Chromium or similar parallel OpenGL software layer to parallelise real-time 3-D across dozens of GPUs at the same time. Cave Automatic Virtual Environment, which really gives the user an immersion of being in a virtual world 'cave' recursively acronyms as CAVE anyway. A*CRC sister institute IHPC has a small setup of this type already for awhile.
Machines alone, even with petabytes of associates storage, huge ventilation, power and backup facilities, and hardware support staff monitoring the usage constantly, are not enough. A*CRC expert team helps the users acquire the right applications – commercial or open source – or, where needed, additional hardware that can be housed at the premises – as well as user code tuning and optimisation. And, as more users from within and outside A*STAR, including NUS, NTU, associated schools and commercial, as well as selected international collaborations, are added to the roster, the pressure to increase resources is constant. It may sound unbelievable to an outside user, but most of the A*CRC system resources are constantly used, every moment, at 90% to 100% usage. Imagine your computer's CPU loaded at that level for, say, 5 years of its life, non stop!
Therefore, to keep up with the user requirements and also competing centres in the region, A*CRC has to increase its capacity rapidly, as soon as possible. After all, Japan and China both have multiple petaflop-class systems right now, including the world's largest one in Kobe reaching close to 10 PFLOPs at RIKEN right now. While we can't have a petaflop machine in Singapore yet as of today, a slightly smaller one, say half-petaflop cluster of large core-count noder, would make very good sense right now, to attract more users to our shores and provide a good base for an even larger national level supercomputing core. This takes an investment in systems, infrastructure and of course people, but would cement Singapore's role as the high performance computing centre of the region.
In the Part 2 of the story, we will look at some good application and code examples how A*CRC users actually make full utilisation of thousands of cores and terabytes or memory, at one go, as well as challenges in configuring, benchmarking and maintaining huge supercomputer systems of this sort, especially in our hot, humid climate here.