AMD has released the V1.0 of CodeXL, a unified developer tool suite that enables developers to quickly and easily identify performance issues and programming errors in applications, without requiring source code modifications.
CodeXL includes comprehensive GPU debugging, GPU and CPU profiling, static OpenCL kernel analysis and a standalone user interface on Windows and Linux for enhanced accessibility and navigation.
Highlights of AMD CodeXL v1.0 include:
GPU Debugger – provides a comprehensive debugging on AMD APUs/GPUs with OpenCL, OpenGL API calls and OpenCL kernels. It allows you to step through real-time OpenCL kernels from API calls, put breakpoints and debug inside the kernel, view all variable values and track API call histories – all on a single computer with a single GPU.
CPU Profiler – a profiling suite that helps you to identify, investigate and tune application performance on AMD CPUs. It finds time critical hotspots in your code precisely with time-based, event-based and instruction-based sampling, and also allows you to narrow profiling to single process and capture profiling data for OpenCL codes running on the CPU. In addition, call graph profiling provides a butterfly view of your function calls with the trace history.
GPU Profiler - a complete GPU profiler that you can use to discover bottlenecks in your OpenCL and DirectCompute applications, and find ways to improve performance on AMD APUs/GPUs. It collects and visualizes GPU counter data, application trace, kernel occupancy and hotspot analysis, with comprehensive timeline and summary views of host, kernel and data transfers in between.
Static Analyzer – a handy utility to analyze your OpenCL application statically, without having to run on the actual hardware. It enables you to compile, analyze and disassemble your OpenCL code, estimate accurate performance of kernels and view disassembly of the generated hardware kernel.
For further information about CodeXL, visit the CodeXL homepage.
The new APP SDK 2.8 includes dozens of new and improved samples for OpenCL, Aparapi and C++ AMP that deliver significantly faster performance than APP SDK 2.7 – up to 2.3x faster on average in nine key benchmarks.
The APP SDK 2.8 also includes a preview version of AMD’s new open source C++ template library, codename “Bolt.”
Bolt is an STL compatible template library of data parallel primitives and provides a standard way to develop an application that can execute on either a regular CPU, or use any available OpenC™ capable accelerated compute unit, with a single code path.
V2.8 also SDK also improves and extends OpenCL capabilities by including support for the Direct3D 11 sharing Khronos extension in addition to including 64-bit atomics.
At SC12, AMD not only got the #1 award for the powerful and energy efficient supercomputer (SANAM) powered primarily by GPUs, AMD also announced an expansion of its software ecosystem by launching a series of tools that will enable HPC developers to take advantage of GPU compute with programming methodologies that integrate OpenCL. (See press release).
Green500 ranking
Powered by 420 AMD FirePro S10000 dual-GPU server graphics cards, the SANAM supercomputer was ranked #2 on Green500 list overall and #1 for GPU-powered systems, beating out the Tesla K20X-based systems. The FirePro-powered system can sustain 420 TFLOPS, providing a system energy efficiency of over 2.3 GFLOPS per watt and performing 2,351 million calculations per second per watt.
Maturing OpenCL Tools for HPC Developers
Accelereyes ArrayFire: Accelereyes is dedicated to delivering fast, simple GPU software. The general availability release of ArrayFire, a GPU software acceleration library that provides hundreds of functions already optimized for speed by top GPU computing experts, allows for easy integration into C, C++, Fortran and Python applications;
Portland Group (PGI) Accelerator compilers: PGI Accelerator Fortran, C and C++ compilers target the AMD line of APUs as well as the AMD line of discrete GPU accelerators. PGI continues to work closely with AMD to extend its PGI Accelerator directive-based compilers. The goal is to generate code directly for AMD GPU accelerators, and to generate heterogeneous x64+GPU executable files that automatically use both the CPU and GPU compute capabilities of AMD APUs;
CAPS Entreprise HMPP compiler: CAPS Entreprise is a leading provider of solutions for deploying applications on “many-core” systems. CAPS source-to-source HMPP compiler is based on C, C++, and Fortran directives and supports OpenACC and OpenHMPP standards. With help from AMD, the compiler incorporates a powerful OpenCL parallel data generator;
AMD CodeXL: AMD CodeXL is a comprehensive tool suite that enables developers to harness the benefits of AMD CPUs, GPUs and APUs. It includes powerful GPU debugging, comprehensive GPU and CPU profiling, and static OpenCL kernel analysis capabilities, enhancing accessibility for software developers to enter the era of heterogeneous computing. AMD CodeXL is available as both a Visual Studio extension and as a standalone user interface application for Windows and Linux.
AMD helped define the specifications for Heterogeneous System Architecture (HSA) and is a founding member of the HSA Foundation. This video with a very cool British accent, describes what HSA is and why it is an important evolution for efficient computing.
Sapphire PGS (Professional Graphics Solution) announced their FirePro bundle which combines an AMD Firepro A300 APU and Sapphire PGS A3M motherboard. The A300 series APU enables Discrete Compute Offload (DCO) technology that lets you add additional discrete FirePro GPUs in parallel with the APU graphics enabling extended OpenCL and GPGPU peformance.