Tesla Cluster구축
홈 > CUDA > Tesla Cluster구축

Tesla Cluster구축


Tesla Cluster 구축 메뉴에서는 CUDA/Tesla를 이용한 클러스터 구축에 대한 다양한 정보를 제공합니다. 미루웨어(Miruware)에서는 관련  HW제품을 취급하고 있습니다. HW제품 메뉴를 확인하시기 바랍니다.  하드웨서 설정, OS 설정(Linux, Windows) 등 다양한 정보를 제공해 드릴 예정입니다.

 

Tesla Cluster 구축에 사용되는 S1070 모습

 

 

프랑스 연구소에 구축된 Tesla Cluster 모습 

 

GTX295 380개 (총 760개의 GPU)를 이용하여 peak 175 TFlops, 안정성능 57Tflops(현재)의 성능을 낸 일본팀의 GPU cluster입니다.  11월 논문 작성 당시는 기계고장으로 47Tflops였습니다.  SC09에서 Gordon Bell Prize를 받았다고 합니다.

 

"42 Tflops Hierarchical N-body Simulations on GPUs with Applications in both Astrophysics and Turbulence", Tsuyoshi Hamada, Rio Yokota, Keigo Nitadori, Tetsu Narumi, Kenji Yasuoka, Makoto Taiji, and Kiyoshi Oguri, SC09 Gordon Bell prize winner, Nov 2009

 

 

 

 

초기 모델은 9800GTX+ 256개와 GTX295 33개로 구축하였지만, 파워서플라이를 교체후 GTX295 380개 시스템으로 업그레이드 하였습니다.


 

사실, 총 421개를 구매하였지만 72개가 메모리 오류 등으로 정상작동하지 않았고 31개는 교체하하여 총 380개로 운영하고 있다고 합니다. Tesla를 사용하였으면 이런 문제는 없었을 것 같네요.

 

 

GPU cluster를 개발한 그룹은 초기 FPGA를 이용한 nbody simulator용  클러스터를 개발하였지만, 가격대 성능비를 위하여 GPU 클러스터를 연구하기 시작하였다고 합니다. 이는 GRAPE 프로젝트의 대학교 레벨의 소형 프로젝트라고 보시면 됩니다.  

PG4x16

[FPGA방식 nbody simulator PROGRAPE-4 : 240Gflops /2006yr]

 

현재의 GRAPE 프로젝트는 GRAPE-6세대 이후의 GRAPE-DR에서 2 Peta Flops까지 구현하였습니다.

Machine Year Peak Notes
GRAPE-2 1990 40Mflops IEEE single/double
HARP-1 1993 180 Mflops Force and its time derivative
GRAPE-4 1995 1Tflops Single-chip pipeline
GRAPE-6 2002 64Tflops 6 pipelines in one chip
GRAPE-DR 2008 2 Pflops New architecture

 

다음은 NCSA에서 구축한 Dell 서버와  Tesla S1070을 이용한 Cluster의 구축정보입니다.

참고하시기 바랍니다.

NCSA to add 47 teraflops of compute power with new heterogeneous system

Installation has begun on a new computational resource at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. Lincoln will deliver peak performance of 47.5 teraflops and is designed to push the envelope in the use of heterogeneous processors for scientific computing. The system is expected to be online in October, bringing NCSA's total computational resources to nearly 155 teraflops.

"Achieving performance at the petascale and exascale and beyond may well depend on a heterogeneous mix of processors," said NCSA Director Thom Dunning. "The use of novel architectures for scientific computing is part of ongoing work at NCSA."

Lincoln will consist of 192 compute nodes (Dell PowerEdge 1950 III dual-socket nodes with quad-core Intel Harpertown 2.33GHz processors and 16GB of memory) and 96 NVIDIA Tesla S1070 accelerator units. Each Tesla unit provides 345.6 gigaflops of double-precision performance and 16GB of memory.

Lincoln's InfiniBand interconnect fabric will be linked to the interconnect fabric of Abe, the 89-teraflop cluster that is currently NCSA's largest resource. This will enable certain applications to run across the entire complex, providing a peak "Abe Lincoln" performance of 136 teraflops.

NCSA's Innovative Systems Laboratory has worked with researchers in many disciplines, from weather modeling to biomolecular simulation, to explore the use of many-core processors, field-programmable gate arrays (FPGAs), and other novel architectures as accelerators for scientific computing. The center maintains a 16-node research cluster, called QP, which includes hardware donated by NVIDIA. NCSA and its collaborators have seen significant speed-ups on a number of applications, including a chemistry direct SCF code, the NAMD molecular dynamics code, and the WRF weather forecasting and research code.

"The NCSA GPU cluster, one of the largest of its kind, is an invaluable resource as we search to solve new classes of weather and climate problems at petascale," said John Michalakes, a scientist at the National Center for Atmospheric Research and the University of Colorado. "Beyond scalable inter-node parallelism, we must have much faster nodes themselves—and applications able to exploit these architectures. QP and its successor Lincoln provide both, giving us a springboard to solving this next tier of earth science problems."

"We anticipate that even more applications will be able to take advantage of Lincoln, given the diverse characteristics of our early-adopter applications," said John Towns, leader of NCSA's Persistent Infrastructure Directorate.

Other University of Illinois efforts also drive heterogeneous computing. Wen-mei Hwu, the Sanders-AMD Endowed Chair in Electrical and Computer Engineering, leads a project to develop application algorithms, programming tools, and software for accelerators at Illinois' Institute for Advanced Computing Applications and Technologies. Hwu also leads the NVIDIA CUDA Center of Excellence at Illinois. Schools receiving this accreditation integrate the CUDA software environment into their curriculum. CUDA is a software development tool that allows programmers to run scientific codes like WRF and NAMD on many-core processors.

"There is a whole new constellation of parallel processing architectures now entering the mainstream," said Hwu. "It is crucial that we begin making use of them to drive scientific discovery and that we prepare the next generation of researchers to harness them."

 

보다 자세한 기술적인 내용은 다음을 차마고하시면 됩니다.

http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/Intel64TeslaCluster/Doc/

 

 

 

 

다음은 호주 수퍼컴퓨터 센터(CSIRO)에서 Tesla S1070를 이용하여 구축한 수퍼컴퓨터의 소개자료입니다.

Speeding up science: CSIRO's CPU-GPU supercomputer cluster

CSIRO's latest supercomputer cluster will be among the world's first to combine traditional CPUs with more powerful graphics processing units or GPUs, providing a world class computational and simulation science facility to advance priority CSIRO science.

·        

CSIRO GPU supercomputer configuration

The new CSIRO high performance computing cluster will deliver up to 200 plus Teraflops of computing performance and will consist of the following components:

·         100 Dual Xeon E5462 Compute Nodes (i.e. a total of 800 2.8GHz compute cores) with 16GB of RAM, 500GB SATA storage and DDR InfiniBand interconnect

·         50 Tesla S1070 (200 GPUs with a total of 48 000 streaming processor cores)

·         96 port DDR InfiniBand Switch expandable to 144 ports

·         80 Terabyte Hitachi NAS file system.

The cluster is supplied by Xenon Systems of Melbourne and is located in Canberra, Australia.

 

GPU technology

The unique feature of the CSIRO graphic processing unit (GPU) supercomputer is the use of NVIDIA GPU technology to deliver outstanding computational performance at low cost and low energy demand.

A single Tesla S1070 can deliver up to 4.14 TFlops of single precision floating point performance and 345 GFlops of double precision floating point performance.

The GPU technology can be accessed using the CUDA parallel computing architecture [external link] or using new compiler technology released by the Portland Group [external link].

CSIRO science applications have already seen 10-100x speedups on NVIDIA GPUs.

 

Driving computational and simulation science

The CSIRO GPU supercomputer will support computational and simulation science research across CSIRO.

The technology can be used to support CSIRO research in the areas of: computational biology

·         climate and weather

·         multi-scale modelling

·         computational fluid dynamics

·         computational chemistry

·         astronomy and astrophysics

·         computational imaging and visualisation

·         advanced materials modelling

·         computational geosciences.

 

E-research agenda

E-research underpins the future delivery of great science.

Modern scientific research is increasingly generating vast quantities of highly complicated data. Making the most of the rich information it contains is the key to success.

Handling and interrogating this information requires advances in data management, computing and collaboration tools – of which the CSIRO GPU Supercomputer is one.

Our e-research agenda is supporting CSIRO’s Transformational Capability Platforms, building our capability to sustain and accelerate the delivery of solutions for national challenges.

The Platforms are:

·         computational and simulation science

·         transformational biology

·         sensors and sensor networks

·         advanced materials.

To find out more about CSIRO's new Central Processing Unit (CPU)-GPU supercomputer cluster and its capabilities contact:

Dr John A Taylor

Phone: 61 2 6216 7077 

Alt Phone: 61 2 6216 7000 

Email: John.A.Taylor@csiro.au

 

보다 자세한 내용은 다음 링크를 참고하시면 됩니다.

http://www.csiro.au/resources/GPU-cluster.html