December 04, 2012 by Tony DeYoung
The new APP SDK 2.8 includes dozens of new and improved samples for OpenCL, Aparapi and C++ AMP that deliver significantly faster performance than APP SDK 2.7 – up to 2.3x faster on average in nine key benchmarks.
The APP SDK 2.8 also includes a preview version of AMD’s new open source C++ template library, codename “Bolt.”
Bolt is an STL compatible template library of data parallel primitives and provides a standard way to develop an application that can execute on either a regular CPU, or use any available OpenC™ capable accelerated compute unit, with a single code path.
V2.8 also SDK also improves and extends OpenCL capabilities by including support for the Direct3D 11 sharing Khronos extension in addition to including 64-bit atomics.
November 15, 2012 by Tony DeYoung
At SC12, AMD not only got the #1 award for the powerful and energy efficient supercomputer (SANAM) powered primarily by GPUs, AMD also announced an expansion of its software ecosystem by launching a series of tools that will enable HPC developers to take advantage of GPU compute with programming methodologies that integrate OpenCL. (See press release).
Powered by 420 AMD FirePro S10000 dual-GPU server graphics cards, the SANAM supercomputer was ranked #2 on Green500 list overall and #1 for GPU-powered systems, beating out the Tesla K20X-based systems. The FirePro-powered system can sustain 420 TFLOPS, providing a system energy efficiency of over 2.3 GFLOPS per watt and performing 2,351 million calculations per second per watt.
Maturing OpenCL Tools for HPC Developers
Accelereyes ArrayFire: Accelereyes is dedicated to delivering fast, simple GPU software. The general availability release of ArrayFire, a GPU software acceleration library that provides hundreds of functions already optimized for speed by top GPU computing experts, allows for easy integration into C, C++, Fortran and Python applications;
Portland Group (PGI) Accelerator compilers: PGI Accelerator Fortran, C and C++ compilers target the AMD line of APUs as well as the AMD line of discrete GPU accelerators. PGI continues to work closely with AMD to extend its PGI Accelerator directive-based compilers. The goal is to generate code directly for AMD GPU accelerators, and to generate heterogeneous x64+GPU executable files that automatically use both the CPU and GPU compute capabilities of AMD APUs;
CAPS Entreprise HMPP compiler: CAPS Entreprise is a leading provider of solutions for deploying applications on “many-core” systems. CAPS source-to-source HMPP compiler is based on C, C++, and Fortran directives and supports OpenACC and OpenHMPP standards. With help from AMD, the compiler incorporates a powerful OpenCL parallel data generator;
AMD CodeXL: AMD CodeXL is a comprehensive tool suite that enables developers to harness the benefits of AMD CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available as both a Visual Studio extension and as a standalone user interface application for Windows and Linux.
June 27, 2012 by Tony DeYoung
Current OpenCL implementations are limited to single heterogeneous systems rather than heterogeneous CPU/GPU clusters.
SnuCL is an OpenCL open-source framework that extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster consists of a single host node and multiple compute nodes connected by Gigabit or InfiniBand switches. The host node contains multiple CPU cores and each compute node consists of a cluster of multiple CPU cores and multiple GPUs. For the programmer, SnuCL provides an illusion of a single heterogeneous system. A GPU or a set of CPU cores becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node. Thus, with SnuCL, OpenCL applications written for a single heterogeneous system with multiple OpenCL compute devices can run on the cluster without any modifications. SnuCL achieves both high performance and ease of programming in a heterogeneous cluster environment.
SnuCL consists of SnuCL runtime and compiler. The SnuCL compiler is based on the OpenCL C compiler in SNU-Samsung OpenCL framework. Currently, the SnuCL compiler supports x86, ARM, and PowerPC CPUs, AMD GPUs, and Nvidia GPUs
May 24, 2012 by Tony DeYoung
Par4All is an open source, automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs. The purpose of the source-to-source compiler is to adapt existing applications to various hardware targets such as multicore systems, high-performance computers, and GPUs. It creates a new source code and thus allows the original source code of the application to remain unchanged.
When used with a a C or FORTRAN application, Par4All automatically generate” a portion of parallel code that can be passed to OpenMP, CUDA (which is then suitable for compiler processing on an NVIDIA GPU), and OpenCL. The generated code is then readable and completely traceable with the original code and the whole process works like a typical compiler job.
This new v1.4 introduces enhancements for loop processing for CUDA and OpenCL kernel generations. Moreover, dependencies resulting from accesses to global variables are more finely analyzed to assess parallelism.
March 08, 2012 by Tony DeYoung
The CO-PRocessing THReads (COPRTHR) SDK (open-source GPLv3 license) provides several OpenCL related libraries and tools for developers targeting many-core compute technology and hybrid CPU/GPU/APU computing architectures. The new v1.4 adds:
- Offline OpenCL compiler
- Redesigned OpenCL implementation for x86_64
- OpenCL implementation for multicore ARM processors
- Preview of replacement for Khronos ICD loader
- Improved build system