AMD Keynote Lectures

Speaker Biography:

Dr. Timour Paltashev works in Silicon Valley over 20 years. He holds Ph.D. (1987) and Doctor of Technical Sciences (1994) degrees in computer engineering. He works in visual/graphics, perceptual and deep learning high-performance computing architectures professional field and has 28+ US patents and over 10 pending patent applications. At present time he works in AMD Radeon Technology Group, previously he worked in Vivante/Verisilicon Corporation, S3 Graphics Inc. and S3 Inc. respectively. Academic activity includes long term experience in Ph.D. candidates advising and graduate level courses teaching in computer graphics, high-performance computer architecture and system-on-chip design in leading universities of the USA and Russian Federation.

Other AMD Lectures:

Introduction to AMD Computing and Graphics Platforms for Server and Client Domains

Sunday, March 10 at 12:00 pm
Building 9, Hall 1

AMD combines breakthrough graphics architecture with cutting-edge software to power platforms that can handle the most challenging, important, and graphics-intensive applications today, including gaming, machine intelligence, and virtual and augmented reality with Radeon Vega GPU product line. Well-known high-performance microprocessors and chipsets Ryzen, Athlon and EPYC deliver powerful, efficient performance for consumer and commercial devices like desktops, notebooks, and servers. AMD has leadership high-performance graphics and compute design capabilities, uniquely enabling us to differentiate solutions for customers and partners. From embedded products that power medical imaging devices and digital signage to semi-custom processors for leading game consoles and beyond, AMD technology is everywhere.

AMD Radeon GPU Architecture Overview based on Polaris and Vega 10 examples

Sunday, March 10 at 3:00 pm
Building 2, Level 5, Room 5220

A detailed overview of the evolution of graphics processing cores and Graphics Core Next (GCN) GPU computing core. From fixed functions geometry transformation engines (1st Era) towards simple shaders (2nd Era) and Graphics Parallel Cores (3rd Era). The rise of unified shaders with VLIW and SIMD architectures as well as single precision 32-bit IEEE-754 Compliant FP ALUs adoption converted GPU cores to general purpose machines with massive GPGPU techniques promotion since mid-2000s.
AMD GCN is new design providing faster performance, higher efficiency, new compute and graphics features with world first GPU on 7-nm FinFET technology which delivers superior levels of architecture scalability from small laptop APU to powerful data center discrete GPU computing nodes. Unlimited Memory Resources & Samplers, all valid UAV formats, public assembly language with the ability to support C/C++ dialects. Architectural support for traps, exceptions & debugging combined with the ability to share virtual x86-64 address space with CPU cores in the same node and big system.

ROCm – Radeon Open Compute Initiative with Completely Open Source Software Stack and Collaboration with Academic and Industrial Community

Monday, March 11 at 10:00 am
Building 2, Level 5, Room 5220

ROCm software platform supports GPU acceleration using open-source Heterogeneous Computing Compiler (HCC) aka Heterogeneous Interface for Portability compiler (HIP), which allows developers to process code more easily with the C++ dialect registered as Clang input in LLVM stack. It provides full machine control for heterogeneous compute on CPU+GPU platform. ROCm provides a rich system run time supporting multiple APIs and programming languages. The ROCm platform also includes a comprehensive ecosystem of development tools and libraries to help port the code written for CUDA to C++ HIP. MIOpen is free and highly optimized open-source library for GPU accelerators enabling popular high-performance machine intelligence frameworks like Keras, Caffe/Caffe2, PyTorch, TensorFlow, MxNet and others.

GPU in High-Performance Computing and Machine Learning: Special Features in Architecture, Firmware and Software Libraries

Monday, March 11 at 12:00 pm
Auditorium betweeen Building 4 & 5, Level 0

Graphics accelerators for real-time visualization first appeared in mid-1970s as fixed function (FF) matrix multipliers and rasterization blocks to be used in flight simulators. Originally based on LSI logic components later in 1990s they have been implemented as single- or multi-chip ASIC-based accelerator with FF set supporting graphics API functionality (OpenGL and MS D3D). Early 2000s first limited programmable GPUs appeared and were followed by new generation of fully programmable and IEEE-754 compliant GPU which could be used for general purpose computing. Adding 64-bit double float support made new GPUs applicable in HPC applications. HPC applications caused significant modifications of GPU architecture originally targeted for vertex and pixel stream real-time processing. HPC capable GPU got the I/O and memory hierarchy architecture adjusted for multiprocessing cluster with complex interlink topology. Extending GPU deployment to machine learning domain cause to add low- and mixed-precision operations support as well as tensor processing acceleration units combined in the plurality of SIMD compute unit. Special highly optimized libraries and firmware were developed to support BLAS and other HPC utilities along with machine learning libraries like cuDNN and MIOpen.

AMD GPU Product Line and ROCm Software Features to Support Machine Learning Applications

Monday, March 11 at 3:00 pm
Building 3, Level 5, Room 5220

Recent advances in machine intelligence algorithms mapped to high-performance GPUs are enabling orders of magnitude acceleration of the processing and understanding of that data, producing insights in near real time. Radeon Instinct MI50/MI60 is a blueprint for an open software ecosystem for machine intelligence, helping to speed inference insights and algorithm training.   Radeon Instinct family of products are designed to be the building blocks for a new era of Deep Learning and HPC datacenters. AMD is designing and optimizing Radeon Instinct server accelerator products and software platforms to bring customers cost-effective machine and deep learning inference, training and edge-training solutions, where workloads can take the most advantage of our accelerator’s highly parallel computing capabilities. The Radeon Instinct product with ROCm software stack is also ideal for acceleration of data-centric HPC-class systems in academic, government lab, energy, life science, financial, automotive, and other industries.

AMD multicore CPU families: Zen CPU core based Ryzen and EPYC Processors Architecture”

Wednesday, March 13 at 10:00 am
Building 19, Level 3, Hall 2

The Ryzen family of processors were a new beginning for AMD and may now definitely rival Intel’s processor market share for both mainstream users and for companies that use servers. First 8-core Ryzen processor were followed by 16-core Ryzen Threadripper which was later complemented by world’s first 32-core desktop processor, the second-generation AMD Ryzen™ Threadripper™ on 12nm technology. It offers up to 64 threads for unparalleled computing power and a stunning cache of up to 80MB. Following Opteron heritage, new EPYC server processor has industry-leading core density (with up to 32-cores), memory and I/O capacity. It provides definite leadership 2-socket performance and unprecedented single-socket abilities as well as unique security capabilities for HPC and data centers. AMD committed long-term server roadmap with followed EPYC Rome with 64 cores built on 7nm advanced semiconductor technology. Architecture features overview and specific hints for all types of users will be discussed.

Zen and Zen2 CPU core Microarchitecture Overview based on AMD HotChips presentations

Wednesday, March 13 at 3:00 pm
Building 19, Level 3, Hall 2

Totally new high-performance Zen CPU cores were designed from the ground up for optimal balance of performance and power with Simultaneous Multithreading (SMT) with 2 threads per core for high throughput. They have new high-bandwidth and low latency cache system with energy-efficient FinFET design which scales from client to enterprise-class products. All-new micro-op cache and up to 20MB unified cache provide superior memory hierarchy performance. Two AES (Advanced Encryption Standard) units support security for users and applications. Zen core may fetch four x86 instructions per cycle to Op cache with 2K instructions. It has 4 integer units and 2 Floating Point units x 128 FMACs built as 4 pipes with 2 Fadd and 2 Fmul. Deep pipeline and large rename space with 168 Registers allows having 192 instructions in flight/8 wide retire. Memory hierarchy supported by 2 Load/Store units providing up to 72 Out-of-Order Loads. 4-way 64K I-Cache, 8-way 32K D-Cache, 8-way 512K L2 Cache and large shared L3 cache compose very efficient memory hierarchy.