不要因为现实的平凡,就否定了那华丽的梦想!
|
Parallel Computing
《斯坦福 CS 149 并行计算》课程深入探讨了现代异构并行计算的原理与技术,重点讲解CUDA编程、内存管理、数据传输、并行算法等关键内容。学生将学习如何利用CUDA、OpenCL、OpenACC等工具进行高效的并行计算,掌握矩阵运算、卷积、归约等常见计算模式的优化实现。
此外,课程还涵盖了性能调优的高级技巧,如内存合并、数据重用、原子操作等,帮助学生深入理解硬件加速的并行编程模型。通过实践项目与实验,学生能够获得在实际应用中设计和优化并行计算系统的技能。适合对高性能计算和GPU编程有兴趣的学生。
⭐⭐⭐
语音 | 字幕 | |
---|---|---|
中文 | ❌ | ✅ |
英文 | ✅ | ✅ |
无
1. 课程概述-Course Overview
2. 异构并行计算导论-Introduction to Heterogeneous Parallel Computing
3. 异构并行计算的可移植性与可扩展性-Portability and Scalability in Heterogeneous Parallel Computing
4. CUDA数据并行与线程导论-Introduction to CUDA Data Parallelism and Threads
5. CUDA内存分配与数据传输API导论-Introduction to CUDA Memory Allocation and Data Movement API
6. CUDA基于内核的SPMD并行编程导论-Introduction to CUDA Kernel-Based SPMD Parallel Programming
7. 基于内核的并行编程:多维内核配置-Kernel-based Parallel Programming Multidimensional Kernel Configuration
8. 基于内核的并行编程:基础矩阵乘法-Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication
9. 基于内核的并行编程:线程调度-Kernel-based Parallel Programming - Thread Scheduling
10. 控制流分歧-Control Divergence
11. 内存模型与局部性——CUDA存储器-Memory Model and Locality -- CUDA Memories
12. 分块并行算法-Tiled Parallel Algorithms
13. 分块矩阵乘法-Tiled Matrix Multiplication
14. 分块矩阵乘法内核-Tiled Matrix Multiplication Kernel
15. 分块处理中的边界条件处理-Handling Boundary Conditions in Tiling
16. 任意矩阵维度的分块内核-A Tiled Kernel for Arbitrary Matrix Dimensions
17. 性能考量——DRAM带宽-Performance Considerations - DRAM Bandwidth
18. 性能考量——CUDA内存合并-Performance Considerations - Memory Coalescing in CUDA
19. 并行计算模式——卷积-Parallel Computation Patterns - Convolution
20. 并行计算模式——分块卷积-Parallel Computation Patterns - Tiled Convolution
21. 并行计算模式——二维分块卷积内核-Parallel Computation Patterns - 2D Tiled Convolution Kernel
22. 并行计算模式——分块卷积中的数据重用-Parallel Computation Patterns - Data Reuse in Tiled Convolution
23. 并行计算模式——归约-Parallel Computation Patterns - Reduction
24. 并行计算模式——基础归约内核-Parallel Computation Patterns - A Basic Reduction Kernel
25. 并行计算模式——改进型归约内核-Parallel Computation Patterns - A Better Reduction Kernel
26. 并行计算模式——扫描(前缀和)-Parallel Computation Patterns - Scan (Prefix Sum)
27. 并行计算模式——低效扫描内核-Parallel Computation Patterns - A Work-Inefficient Scan Kernel
28. 并行计算模式——高效并行扫描内核-Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel
29. 并行计算模式——深入探讨并行扫描-Parallel Computation Patterns - More on Parallel Scan
30. 并行计算模式——直方图生成-Parallel Computation Patterns - Histogramming
31. 并行计算模式——原子操作-Parallel Computation Patterns - Atomic Operations
32. 并行计算模式——CUDA中的原子操作-Parallel Computation Patterns - Atomic Operations in CUDA
33. 并行计算模式——原子操作性能-Parallel Computation Patterns - Atomic Operations Performance
34. 并行计算模式——私有化直方图内核-Parallel Computation Patterns - A Privatized Histogram Kernel
35. 高效主机-设备数据传输——固定主机内存-Efficient Host-Device Data Transfer - Pinned Host Memory
36. 高效主机-设备数据传输——CUDA中的任务并行-Efficient Host-Device Data Transfer - Task Parallelism in CUDA
37. 高效主机-设备数据传输——数据传输与计算重叠-Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation
38. 相关编程模型——OpenCL数据并行模型-Related Programming Models - OpenCL Data Parallelism Model
39. 相关编程模型——OpenCL设备架构-Related Programming Models - OpenCL Device Architecture
40. 相关编程模型——OpenCL主机代码(第一部分)-Related Programming Models - OpenCL Host Code Part 1
41. 相关编程模型——OpenCL主机代码(续)-Related Programming Models - OpenCL Host Code (Cont.)
42. 相关编程模型——OpenACC-Related Programming Models - OpenACC
43. 相关编程模型——OpenACC详解-Related Programming Models - OpenACC Details
44. 相关并行模型——C++ AMP-Related Parallel Models - C++ AMP
45. 相关并行模型——C++ AMP高级概念-Related Parallel Models - C++ AMP Advance Concepts
46. 相关并行模型——异构超级计算导论与M-Related Parallel Models - Introduction to Heterogeneous Supercomputing and M
47. 结论与未来方向-Conclusions and Future Directions