[ParallelComputing]CUDA & HIP & DPC++ & TBB Notes
keywords: ParallelComputing, CUDA, HIP, DPC++
Books
CUDA Books
Learn CUDA Programming, published by Packt
https://github.com/PacktPublishing/Learn-CUDA-Programming
SYCL Books
Data Parallel C++. Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL
https://www.apress.com/gp/book/9781484255735
https://github.com/Apress/data-parallel-CPP
Docs
Heterogeneous Computing
Heterogeneous computing
https://en.wikipedia.org/wiki/Heterogeneous_computing
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2964860/
A Heterogeneous Parallel Processor for High-Speed Vision Chip
https://www.researchgate.net/publication/310823555_A_Heterogeneous_Parallel_Processor_for_High-Speed_Vision_Chip
CUDA
Numba supports CUDA GPU programming by directly compiling a restricted subset of Python code into CUDA kernels and device functions following the CUDA execution model.
https://numba.readthedocs.io/en/stable/cuda/index.html
This notebook is an attempt to teach beginner GPU programming in a completely interactive fashion. Instead of providing text with concepts, it throws you right into coding and building GPU kernels.
https://github.com/srush/GPU-Puzzles
AMD Offical Docs
AMD’s Performance Guide is a nice collection of tips on how to program the GCN and RDNA architectures efficiently.
https://gpuopen.com/performance/
ROCm Docs
AMD ROCm Tensorflow
https://rocmdocs.amd.com/en/latest/Deep_learning/Deep-learning.html
【全网首发】AMD显卡上完美原生运行PyTorch攻略,无需容器(Docker)
https://zhuanlan.zhihu.com/p/67940936
Building PyTorch on ROCm
https://lernapparat.de/pytorch-rocm/
DPC++ Docs
A Standards-Based, Cross-Architecture Language
https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-compiler.html
Intel Data Parallel C++ Tutorial
https://github.com/jeffhammond/dpcpp-tutorial
SYCL Docs
Look ma, no CUDA! Programming GPUs with modern C++ and SYCL
https://nazavode.github.io/blog/sycl/
Accelerating your C++ on GPU with SYCL
https://blog.tartanllama.xyz/sycl/
C++ Single-source Heterogeneous Programming for Acceleration Offload
https://www.khronos.org/sycl/
GPU based Source
Cross-Platform Frameworks
stdgpu: Efficient STL-like Data Structures on the GPU
https://github.com/stotko/stdgpu
CUDA Source
Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
https://github.com/NVIDIA/cuda-samples
Thin C++-flavored wrappers for the CUDA Runtime API
https://github.com/eyalroz/cuda-api-wrappers
HIP Source
HIP: C++ Heterogeneous-Compute Interface for Portability
https://github.com/ROCm-Developer-Tools/HIP
HIP RT is a ray tracing library for HIP, making it easy to write ray-tracing applications in HIP.
https://gpuopen.com/hiprt/
https://github.com/GPUOpen-LibrariesAndSDKs/HIPRT/
OpenCL Source
A C++ GPU Computing Library for OpenCL
https://github.com/boostorg/compute
SYCL Source (OpenCL Based)
Open Source Parallel STL implementation
https://github.com/KhronosGroup/SyclParallelSTL
Experimental fusion of triSYCL with Intel SYCL upstreaming effort into Clang/LLVM.
https://github.com/triSYCL/sycl
CPU based Source
TBB (CPU) Source
Official Threading Building Blocks (TBB) GitHub repository.
https://github.com/oneapi-src/oneTBB
For Commercial Intel® TBB distribution, please click here:
https://software.intel.com/en-us/tbb
SIMD Instructions (CPU) Source
The Vector Class Library is a C++ tool that allows programmers to use Single Instruction Multiple Data (SIMD) instructions to process data in parallel
https://github.com/vectorclass
Vector class library, latest version
https://github.com/vectorclass/version2
Data parallel C++ mathematical object library
https://github.com/paboyle/Grid
Concurrent Data Structures
A C++ library of Concurrent Data Structures
https://github.com/khizmax/libcds
Parallel Utils & Frameworks
Simple header-only implementation of “parallel_for” and “parallel_map” for C++11
https://github.com/yuki-koyama/parallel-util
Powerful multi-threaded coroutine dispatcher and parallel execution engine
https://github.com/bloomberg/quantum
Memory
Distributed Memory Dense Matrix Computations
Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction
https://github.com/elemental/Elemental
Platform
ROCm Platform
ROCm Software Platform Repository
https://github.com/ROCmSoftwarePlatform
Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://github.com/ROCmSoftwarePlatform/pytorch
how often', he said,'does a man ruin his disciples by remaining always with them. ― Romain Rolland, Life of Vivekananda and the Universal Gospel