Difference between revisions of "APEnet+ project"

From APEWiki
Jump to: navigation, search
m
 
(51 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''APEnet+''' is the new generation of our 3D network
+
__NOTOC__
adapters for PC clusters.  
+
<div style="font-size:20px; padding-left:5px; padding-top:5px; padding-bottom:5px; border-style:solid; border-width:3px; border-color:#1E90FF; font-style:bold">'''APEnet+''' is the new generation of our 3D network adapters targeting hybrid CPU-GPU-based HPC platforms.</div>
 
 
[[Image:ApenetPlus_Board.jpg|border|600px|right]]
 
  
 
==Project Background==
 
==Project Background==
Line 14: Line 12:
 
via some parallel programming model, mainly Message Passing
 
via some parallel programming model, mainly Message Passing
 
Interface (MPI).
 
Interface (MPI).
 +
 +
<gallery heights=200px widths=300px caption="Pictures of APEnet boards">
 +
File:Apenet+_2.jpg        | APEnet+ board, front view
 +
File:Apenet+_1.jpg        | APEnet+ board, 3D Torus network cable connectors
 +
File:ApenetPlus_Board.jpg | APEnet+ 3 links test board, based on a commercial development board
 +
</gallery>
 +
 +
<gallery heights=500px widths=1000px>
 +
Image:Tech_figure.jpg|APEnet+ board technical details.
 +
</gallery>
 +
 +
<!--
 +
[[Image:Apenet+_2.jpg        |border|300px|right|thumb|APEnet+ board, front view]]
 +
[[Image:Apenet+_1.jpg        |border|200px|right|thumb|APEnet+ board, 3D Torus network cable connectors]]
 +
[[Image:ApenetPlus_Board.jpg  |border|300px|right|thumb|APEnet+ 3 links test board, based on a commercial development board]]
 +
-->
  
 
==APEnet+ aim and features==
 
==APEnet+ aim and features==
Line 30: Line 44:
 
* Packets have a fixed size envelope (header+footer) and are auto-routed to their final destinations according to wormhole dimension-ordered static routing, with dead-lock avoidance.
 
* Packets have a fixed size envelope (header+footer) and are auto-routed to their final destinations according to wormhole dimension-ordered static routing, with dead-lock avoidance.
 
* Error detection is implemented via CRC at packet level.
 
* Error detection is implemented via CRC at packet level.
* Basic RDMA capabilities, PUT and GET, are implemented at the firmware level.
+
* Basic RDMA capabilities, e.g. RDMA PUT & SEND as well as address translation of memory registration, are implemented at the firmware level. RDMA GET is under development.
* Fault-tolerance features (will be added from 2011).
+
* Fault-tolerance features (will be added starting from 2012).
 +
* Direct access to GPU memory using PCI express peer-to-peer (NVidia Fermi GPUs only).
 +
 
 +
<br style="clear: both" />
 +
 
 +
===GPU I/O accelerator===
 +
'''APEnet+''' has the ability to take part in the so-called PCIe peer-to-peer (P2P)
 +
transactions [[http://www.pcisig.com/members/downloads/specifications/pciexpress/PCI_Express_Base_r2_1_04Mar09_cb.pdf]]; '''APEnet+ is the first non-NVIDIA device with specialized hardware blocks to support the [http://developer.nvidia.com/object/gpudirect.html NVIDIA GPUdirect peer-to-peer inter-GPU protocol]'''. This means that the
 +
APEnet+ network board can target GPU memory by ordinary RDMA semantics with no CPU
 +
involvement and dispensing entirely with intermediate copies. In this way, real
 +
zero-copy, inter-node GPU-to-host, host-to-GPU or GPU-to-GPU transfers can be achieved,
 +
with substantial reductions in latency.
 +
<br>
 +
 
 +
 
 +
<gallery widths=400px heights=250px>
 +
File:Apenet_nop2p_hor.png | Traditional data flow: data transfers via off-the-shelf network interconnects (e.g. Infiniband) involve the CPU and necessitate of intermediate copies.
 +
File:Apenet_p2p_hor.png | APEnet+ data flow: APEnet+ is the first non-NVIDIA device with specialized hardware blocks to support the NVIDIA GPUdirect peer-to-peer inter-GPU protocol, enabling zero-copy, inter-node GPU-to-host, host-to-GPU or GPU-to-GPU transfers with substantial reductions in latency.
 +
</gallery>
 +
 
 +
 
 +
<!-- widths=400px heights=300px  -->
 +
 
 +
=== PERFORMANCE ===
 +
<gallery widths=380px heights=300px caption="APEnet+ bandwidth and latency">
 +
File:bandwidth_deliv_2013.jpg | Bandwidth test
 +
File:G2G latency deliv 2013.jpg‎ | Roundtrip latency test
 +
</gallery>
 +
 
 +
===Acknowledgements===
 +
This work was partially supported by the''' EU Framework Programme 7 project [http://www.euretile.eu EURETILE]'''
 +
under grant number 247846.
 +
We would like to thank Massimiliano Fatica and Timothy Murray of '''[http://www.nvidia.com NVIDIA Corp.]'''
 +
for supporting the GPU P2P developments.
 +
 
 +
=== GPU Cluster installation ===
 +
* Where: APE lab (INFN Roma 1)
 +
* What: [[GPUcluster]]
 +
 
 +
 
 +
==APEnet+ Public Documentation==
 +
 
 +
* [[Media:Flyer2012.pdf|APEnet+ flyer]] <span style="font-weight:bold; color:blue">&larr; DOWNLOAD APEnet+ FLYER!!</span>
 +
* [[APEnet+_Photo_Gallery|APEnet+ Photo Gallery]]
 +
* [[APEnet+_Publications|APEnet+ Publications]]
 +
 
 +
----
 +
Internal links (require login):<br>
 +
[[APEnet+_HW|APEnet+ HW]], [[APEnet+_SW|APEnet+ SW]], [[APEnet+_specs|APEnet+ specification]], [[NextDeadlinesForPubblication|Next Deadlines For Pubblication]]

Latest revision as of 11:47, 7 March 2014

APEnet+ is the new generation of our 3D network adapters targeting hybrid CPU-GPU-based HPC platforms.

Project Background

Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to bookkeeping of memory buffers. Additionally, top-notch problems may easily employ more than a PetaFlops of sustained computing power, requiring thousands of GPUs orchestrated via some parallel programming model, mainly Message Passing Interface (MPI).


APEnet+ aim and features

The project target is the development of a low latency, high bandwidth direct network, supporting state-of-the-art wire speeds and PCIe X8 gen2 while improving the price/performance ratio on scaling the cluster size. The network interface provides hardware support for the RDMA programming model. A Linux kernel driver, a set of low-level RDMA APIs and an OpenMPI library driver are available; this allows for painless porting of standard applications.

Highlights

  • APEnet+ is a packet-based direct network of point-to-point links with 2D/3D toroidal topology.
  • Packets have a fixed size envelope (header+footer) and are auto-routed to their final destinations according to wormhole dimension-ordered static routing, with dead-lock avoidance.
  • Error detection is implemented via CRC at packet level.
  • Basic RDMA capabilities, e.g. RDMA PUT & SEND as well as address translation of memory registration, are implemented at the firmware level. RDMA GET is under development.
  • Fault-tolerance features (will be added starting from 2012).
  • Direct access to GPU memory using PCI express peer-to-peer (NVidia Fermi GPUs only).


GPU I/O accelerator

APEnet+ has the ability to take part in the so-called PCIe peer-to-peer (P2P) transactions [[1]]; APEnet+ is the first non-NVIDIA device with specialized hardware blocks to support the NVIDIA GPUdirect peer-to-peer inter-GPU protocol. This means that the APEnet+ network board can target GPU memory by ordinary RDMA semantics with no CPU involvement and dispensing entirely with intermediate copies. In this way, real zero-copy, inter-node GPU-to-host, host-to-GPU or GPU-to-GPU transfers can be achieved, with substantial reductions in latency.



PERFORMANCE

Acknowledgements

This work was partially supported by the EU Framework Programme 7 project EURETILE under grant number 247846. We would like to thank Massimiliano Fatica and Timothy Murray of NVIDIA Corp. for supporting the GPU P2P developments.

GPU Cluster installation


APEnet+ Public Documentation


Internal links (require login):
APEnet+ HW, APEnet+ SW, APEnet+ specification, Next Deadlines For Pubblication