🗊 Презентация NAMD-BluegeneL

Категория: Образование
Нажмите для полного просмотра!
NAMD-BluegeneL, слайд №1 NAMD-BluegeneL, слайд №2 NAMD-BluegeneL, слайд №3 NAMD-BluegeneL, слайд №4 NAMD-BluegeneL, слайд №5 NAMD-BluegeneL, слайд №6 NAMD-BluegeneL, слайд №7 NAMD-BluegeneL, слайд №8 NAMD-BluegeneL, слайд №9 NAMD-BluegeneL, слайд №10 NAMD-BluegeneL, слайд №11 NAMD-BluegeneL, слайд №12 NAMD-BluegeneL, слайд №13 NAMD-BluegeneL, слайд №14 NAMD-BluegeneL, слайд №15 NAMD-BluegeneL, слайд №16 NAMD-BluegeneL, слайд №17 NAMD-BluegeneL, слайд №18 NAMD-BluegeneL, слайд №19 NAMD-BluegeneL, слайд №20 NAMD-BluegeneL, слайд №21 NAMD-BluegeneL, слайд №22 NAMD-BluegeneL, слайд №23 NAMD-BluegeneL, слайд №24 NAMD-BluegeneL, слайд №25 NAMD-BluegeneL, слайд №26 NAMD-BluegeneL, слайд №27 NAMD-BluegeneL, слайд №28 NAMD-BluegeneL, слайд №29 NAMD-BluegeneL, слайд №30 NAMD-BluegeneL, слайд №31 NAMD-BluegeneL, слайд №32 NAMD-BluegeneL, слайд №33 NAMD-BluegeneL, слайд №34 NAMD-BluegeneL, слайд №35 NAMD-BluegeneL, слайд №36 NAMD-BluegeneL, слайд №37 NAMD-BluegeneL, слайд №38 NAMD-BluegeneL, слайд №39 NAMD-BluegeneL, слайд №40 NAMD-BluegeneL, слайд №41 NAMD-BluegeneL, слайд №42 NAMD-BluegeneL, слайд №43 NAMD-BluegeneL, слайд №44 NAMD-BluegeneL, слайд №45 NAMD-BluegeneL, слайд №46 NAMD-BluegeneL, слайд №47 NAMD-BluegeneL, слайд №48

Содержание

Вы можете ознакомиться и скачать презентацию на тему NAMD-BluegeneL. Доклад-сообщение содержит 48 слайдов. Презентации для любого класса можно скачать бесплатно. Если материал и наш сайт презентаций Mypresentation Вам понравились – поделитесь им с друзьями с помощью социальных кнопок и добавьте в закладки в своем браузере.

Слайды и текст этой презентации


Слайд 1


Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Blue Gene Software Group, IBM T J Watson Research Center, Yorktown...
Описание слайда:
Achieving Strong Scaling On Blue Gene/L: Case Study with NAMD Sameer Kumar, Blue Gene Software Group, IBM T J Watson Research Center, Yorktown Heights, NY sameerk@us.ibm.com

Слайд 2


Outline Motivation NAMD and Charm++ BGL Techniques Problem mapping Overlap of communication with computation Grain size Load-balancing Communication...
Описание слайда:
Outline Motivation NAMD and Charm++ BGL Techniques Problem mapping Overlap of communication with computation Grain size Load-balancing Communication optimizations Summary

Слайд 3


Blue Gene/L
Описание слайда:
Blue Gene/L

Слайд 4


NAMD-BluegeneL, слайд №4
Описание слайда:

Слайд 5


Application Scaling Weak Problem size increases with processors Strong Constant problem size Linear to sub-linear decrease in computation time with...
Описание слайда:
Application Scaling Weak Problem size increases with processors Strong Constant problem size Linear to sub-linear decrease in computation time with processors Cache performance Communication overhead Communication to computation ratio

Слайд 6


Scaling on Blue Gene/L Several applications have demonstrated weak scaling Strong scaling on a large number of benchmarks still needs to be achieved
Описание слайда:
Scaling on Blue Gene/L Several applications have demonstrated weak scaling Strong scaling on a large number of benchmarks still needs to be achieved

Слайд 7


NAMD and Charm++
Описание слайда:
NAMD and Charm++

Слайд 8


NAMD: A Production MD program
Описание слайда:
NAMD: A Production MD program

Слайд 9


NAMD-BluegeneL, слайд №9
Описание слайда:

Слайд 10


Molecular Dynamics in NAMD Collection of [charged] atoms, with bonds Newtonian mechanics Thousands of atoms (10,000 - 500,000) At each time-step...
Описание слайда:
Molecular Dynamics in NAMD Collection of [charged] atoms, with bonds Newtonian mechanics Thousands of atoms (10,000 - 500,000) At each time-step Calculate forces on each atom Bonds: Non-bonded: electrostatic and van der Waal’s Short-distance: every timestep Long-distance: using PME (3D FFT) Multiple Time Stepping : PME every 4 timesteps Calculate velocities and advance positions Challenge: femtosecond time-step, millions needed!

Слайд 11


NAMD Benchmarks
Описание слайда:
NAMD Benchmarks

Слайд 12


Parallel MD: Easy or Hard? Easy Tiny working data Spatial locality Uniform atom density Persistent repetition Multiple time-stepping
Описание слайда:
Parallel MD: Easy or Hard? Easy Tiny working data Spatial locality Uniform atom density Persistent repetition Multiple time-stepping

Слайд 13


NAMD Computation Application data divided into data objects called patches Sub-grids determined by cutoff Computation performed by migratable...
Описание слайда:
NAMD Computation Application data divided into data objects called patches Sub-grids determined by cutoff Computation performed by migratable computes 13 computes per patch pair and hence much more parallelism Computes can be further split to increase parallelism

Слайд 14


NAMD Scalable molecular dynamics simulation 2 types of objects: patches and computes, to expose more parallelism Requires more careful load balancing
Описание слайда:
NAMD Scalable molecular dynamics simulation 2 types of objects: patches and computes, to expose more parallelism Requires more careful load balancing

Слайд 15


Communication to Computation Ratio Scalable Constant with number of processors In practice grows at a very small rate
Описание слайда:
Communication to Computation Ratio Scalable Constant with number of processors In practice grows at a very small rate

Слайд 16


Charm++ and Converse Charm++: object-based asynchronous message-driven parallel programming paradigm Converse: communication layer for Charm++ Send,...
Описание слайда:
Charm++ and Converse Charm++: object-based asynchronous message-driven parallel programming paradigm Converse: communication layer for Charm++ Send, recv, progress, on node level

Слайд 17


Optimizing NAMD on Blue Gene/L
Описание слайда:
Optimizing NAMD on Blue Gene/L

Слайд 18


Single Processor Performance Worked with IBM Toronto for 3 weeks Inner loops slightly altered to enable software pipelining Aliasing issues resolved...
Описание слайда:
Single Processor Performance Worked with IBM Toronto for 3 weeks Inner loops slightly altered to enable software pipelining Aliasing issues resolved through the use of #pragma disjoint (*ptr1, *ptr2) 40% serial speedup Current best performance is with 440 Continued efforts with Toronto to get good 440d performance

Слайд 19


NAMD on BGL Advantages Both application and hardware are 3D grids Large 4MB L3 cache On large number of processors NAMD will run from L3 Higher...
Описание слайда:
NAMD on BGL Advantages Both application and hardware are 3D grids Large 4MB L3 cache On large number of processors NAMD will run from L3 Higher bandwidth for short messages Midpoint of peak bandwidth achieved quickly Six outgoing links from each node No OS Daemons

Слайд 20


NAMD on BGL Disadvantages Slow embedded CPU Small memory per node Low bisection bandwidth Hard to scale full electrostatics Limited support for...
Описание слайда:
NAMD on BGL Disadvantages Slow embedded CPU Small memory per node Low bisection bandwidth Hard to scale full electrostatics Limited support for overlap of computation and communication No cache coherence

Слайд 21


BGL Parallelization Topology driven problem mapping Load-balancing schemes Overlap of computation and communication Communication optimizations
Описание слайда:
BGL Parallelization Topology driven problem mapping Load-balancing schemes Overlap of computation and communication Communication optimizations

Слайд 22


Problem Mapping
Описание слайда:
Problem Mapping

Слайд 23


Problem Mapping
Описание слайда:
Problem Mapping

Слайд 24


Problem Mapping
Описание слайда:
Problem Mapping

Слайд 25


Problem Mapping
Описание слайда:
Problem Mapping

Слайд 26


Two Away Computation Each data object (patch) is split along a dimension Patches now interact with neighbors of neighbors Makes application more fine...
Описание слайда:
Two Away Computation Each data object (patch) is split along a dimension Patches now interact with neighbors of neighbors Makes application more fine grained Improves load balancing Messages of smaller size sent to more processors Improves torus bandwidth

Слайд 27


Two Away X
Описание слайда:
Two Away X

Слайд 28


Load Balancing Steps
Описание слайда:
Load Balancing Steps

Слайд 29


Load-balancing Metrics Balancing load Minimizing communication hop-bytes Place computes close to patches Biased through placement of proxies on near...
Описание слайда:
Load-balancing Metrics Balancing load Minimizing communication hop-bytes Place computes close to patches Biased through placement of proxies on near neighbors Minimizing number of proxies Effects connectivity of each data object

Слайд 30


Overlap of Computation and Communication Each FIFO has 4 packet buffers Progress engine should be called every 4400 cycles Overhead of about 200...
Описание слайда:
Overlap of Computation and Communication Each FIFO has 4 packet buffers Progress engine should be called every 4400 cycles Overhead of about 200 cycles 5 % increase in computation Remaining time can be used for computation

Слайд 31


Network Progress Calls NAMD makes progress engine calls from the compute loops Typical frequency is10000 cycles, dynamically tunable
Описание слайда:
Network Progress Calls NAMD makes progress engine calls from the compute loops Typical frequency is10000 cycles, dynamically tunable

Слайд 32


MPI Scalability Charm++ MPI Driver Iprobe based implementation Higher progress overhead of MPI_Test Statically pinned FIFOs for point to point...
Описание слайда:
MPI Scalability Charm++ MPI Driver Iprobe based implementation Higher progress overhead of MPI_Test Statically pinned FIFOs for point to point communication

Слайд 33


Charm++ Native Driver BGX Message Layer (developed by George Almasi) Lower progress overhead Active messages Easily design complex communication...
Описание слайда:
Charm++ Native Driver BGX Message Layer (developed by George Almasi) Lower progress overhead Active messages Easily design complex communication protocols Dynamic FIFO mapping Low overhead remote memory access Interrupts Charm++ BGX driver was developed by Chao Huang over this summer

Слайд 34


BG/L Msglayer ( This slide is taken from G. Almási’s talk on the “new” msglayer. )
Описание слайда:
BG/L Msglayer ( This slide is taken from G. Almási’s talk on the “new” msglayer. )

Слайд 35


Optimized Multicast
Описание слайда:
Optimized Multicast

Слайд 36


Communication Pattern in PME
Описание слайда:
Communication Pattern in PME

Слайд 37


PME Plane decomposition for 3D-FFT PME objects placed close to patch objects on the torus PME optimized through an asynchronous all-to-all with...
Описание слайда:
PME Plane decomposition for 3D-FFT PME objects placed close to patch objects on the torus PME optimized through an asynchronous all-to-all with dynamic FIFO mapping

Слайд 38


Performance Results
Описание слайда:
Performance Results

Слайд 39


BGX Message layer vs MPI Fully non-blocking version performed below par on MPI Polling overhead high for a list of posted receives BGX message layer...
Описание слайда:
BGX Message layer vs MPI Fully non-blocking version performed below par on MPI Polling overhead high for a list of posted receives BGX message layer works well with asynchronous communication

Слайд 40


Blocking vs Overlap
Описание слайда:
Blocking vs Overlap

Слайд 41


Effect of Network Progress (Projections timeline of a 1024-node run without aggressive network progress) Network progress not aggressive enough:...
Описание слайда:
Effect of Network Progress (Projections timeline of a 1024-node run without aggressive network progress) Network progress not aggressive enough: communication gaps eat up utilization

Слайд 42


Effect of Network Progress (2)
Описание слайда:
Effect of Network Progress (2)

Слайд 43


Virtual Node Mode
Описание слайда:
Virtual Node Mode

Слайд 44


Spring vs Now
Описание слайда:
Spring vs Now

Слайд 45


Summary
Описание слайда:
Summary

Слайд 46


Summary Demonstrated good scaling to 4k processors for the APoA1 with a speedup of 2100 Still working on 8k results ATPase scales well to 8k...
Описание слайда:
Summary Demonstrated good scaling to 4k processors for the APoA1 with a speedup of 2100 Still working on 8k results ATPase scales well to 8k processors with a speedup of 4000+

Слайд 47


Lessons Learnt Eager messages lead to contention Rendezvous messages don’t perform well with mid size messages Topology optimizations are a big...
Описание слайда:
Lessons Learnt Eager messages lead to contention Rendezvous messages don’t perform well with mid size messages Topology optimizations are a big winner Overlap of computation and communication is possible Overlap however makes compute load less predictable Lack of operating system daemons leads to massive scaling

Слайд 48


Future Plans Experiment with new communication protocols Remote memory access Adaptive eager Fast asynchronous collectives Improve load-balancing...
Описание слайда:
Future Plans Experiment with new communication protocols Remote memory access Adaptive eager Fast asynchronous collectives Improve load-balancing Newer distributed strategies Heavy processors dynamically unload to neighbors Pencil decomposition for PME Using the double hummer



Теги NAMD BluegeneL
Похожие презентации
Mypresentation.ru
Загрузить презентацию