AMD helped define the specifications for Heterogeneous System Architecture (HSA) and is a founding member of the HSA Foundation. This video with a very cool British accent, describes what HSA is and why it is an important evolution for efficient computing.
ARM announced they have submitted to Khronos for their OpenCL 1.1 Full Profile conformance tests results for the MaliTM-T604 GPU. What is unique here is that this is a Full as opposed to Embedded Profile, even though the primary target is the embedded and mobile markets.
Key features that define the full profile include:
Native support for 64-bit integer maths (including vector data types and operations). This helps in areas such as pointer arithmetic for large address spaces and can benefit many applications including multimedia encoders, decoders and encryption software.
Hardware accelerated support for 3D images such as volumetric modeling.
Same precision and accuracy as desktop implementations
Built-in atomic functions accelerated in hardware.
This week, founding members from AMD, ARM, Imagination Technologies, MediaTek Inc., Samsung Electronics Co., Ltd. and Texas Instruments (TI) came together to form the non-profit consortium HSA Foundation to promote the broad and open industry adoption of HSA (Heterogeneous System Architecture).
The HSA Foundation is driving industry standards and is making computing easier, more energy efficient and cost effective in the following areas:
Client and low power: HSA makes for highly interactive computing where there is a stream of sensor data being used to make decisions and either filter, manipulate or find things – a tremendous benefit to video and image processing, photo manipulation, compression, augmented reality, and more.
Server: HSA makes cloud applications such as media servers, data and video analytics, HPC applications, and streamed gaming, with a greater ease of programming and better performance per watt – all resulting in a lower TCO.
Mobile: As power and form factor are critical, HSA will help smartphones and tablets advance as mobile computing platforms, where the demand is for more compute for interactive applications, visualizations and graphics, all with improved battery life. Standards based on HSA will help application developers innovate with the right platform capabilities.
HSA members are building a heterogeneous compute ecosystem, rooted in industry standards, for combining scalar processing on the CPU with parallel processing on the GPU while enabling high bandwidth access to memory and high application performance at low power consumption. HSA defines interfaces for parallel computation utilizing CPU, GPU and other programmable and fixed function devices, and support for a diverse set of high-level programing languages, thereby creating the next foundation in general purpose computing.
The CARP project (Correct and Efficient Accelerator Programming) aims to develop a new programming language called PIL (Portable Intermediate Language) to improve the programmability of accelerated systems, particularly systems accelerated with GPUs. PIL will receive compiled inputs from domain-specific languages and then target its output to industry-standard OpenCL.
Compiler optimizations are targeted to provide:
A performance increase of at least 4x when comparing optimized code with non-optimized code, on multiple platforms
A reduction in energy consumption of at least 20 percent
Automatic detection of at least 70 percent of known functional errors
A reduction of several orders of magnitude in time taken to design an application to run efficiently across multiple accelerator platforms
The Portland Group (wholly-owned subsidiary of STMicroelectronics) announced PGI OpenCL Compiler For ARM - an OpenCL framework that will initially target ST-Ericsson’s NovaThor U8500 SoC, which is based on a dual-core Cortex-A9 CPU and coupled with an ARM Mali 400 MP GPU. The framework includes a PGI OpenCL compiler for multi-core ARM CPUs as a compute device and complements OpenCL for GPUs.
In the OpenCL programming model, the host CPU controls all operation of a compute device. The device can be a GPU, another CPU, the host CPU itself running in multi-core mode, or some other type of compute device. The PGI OpenCL framework is comprised of five core components:
PGI OpenCL device compiler - compiles OpenCL kernels for parallel execution on multi-core ARM processors
PGCL driver - a command-level driver for processing source files containing C99, C++ or OpenCL program units, including support for static compilation of OpenCL kernels
OpenCL host compilers - the PGCL driver uses the Android native development kit versions of gcc and g++ to compile OpenCL host code
OpenCL Platform Layer - a library of routines to query platform capabilities and create execution contexts from OpenCL host code
OpenCL Runtime Layer - a library of routines and an extensible runtime system used to set up and execute OpenCL kernels on multi-core ARM
The initial release provides OpenCL 1.1 embedded profile support. The PGI OpenCL framework runs on Linux/x86 compilation host platforms and is integrated with the Android NDK toolchain to generate binary executables for ST-Ericsson NovaThor platforms running the Android operating system.