不要因为现实的平凡,就否定了那华丽的梦想!

10.1 斯坦福 CS 149 并行计算

课程名称

Parallel Computing

课程概要
《斯坦福 CS 149 并行计算》课程深入探讨了现代异构并行计算的原理与技术,重点讲解CUDA编程、内存管理、数据传输、并行算法等关键内容。学生将学习如何利用CUDA、OpenCL、OpenACC等工具进行高效的并行计算,掌握矩阵运算、卷积、归约等常见计算模式的优化实现。

此外,课程还涵盖了性能调优的高级技巧,如内存合并、数据重用、原子操作等,帮助学生深入理解硬件加速的并行编程模型。通过实践项目与实验,学生能够获得在实际应用中设计和优化并行计算系统的技能。适合对高性能计算和GPU编程有兴趣的学生。
| 推荐系数 |

⭐⭐⭐

语音字幕
中文
英文

课程大纲
1. 课程概述-Course Overview  
2. 异构并行计算导论-Introduction to Heterogeneous Parallel Computing  
3. 异构并行计算的可移植性与可扩展性-Portability and Scalability in Heterogeneous Parallel Computing  
4. CUDA数据并行与线程导论-Introduction to CUDA Data Parallelism and Threads  
5. CUDA内存分配与数据传输API导论-Introduction to CUDA Memory Allocation and Data Movement API  
6. CUDA基于内核的SPMD并行编程导论-Introduction to CUDA Kernel-Based SPMD Parallel Programming  
7. 基于内核的并行编程:多维内核配置-Kernel-based Parallel Programming Multidimensional Kernel Configuration  
8. 基于内核的并行编程:基础矩阵乘法-Kernel-based Parallel Programming Basic Matrix-Matrix Multiplication  
9. 基于内核的并行编程:线程调度-Kernel-based Parallel Programming - Thread Scheduling  
10. 控制流分歧-Control Divergence  
11. 内存模型与局部性——CUDA存储器-Memory Model and Locality -- CUDA Memories  
12. 分块并行算法-Tiled Parallel Algorithms  
13. 分块矩阵乘法-Tiled Matrix Multiplication  
14. 分块矩阵乘法内核-Tiled Matrix Multiplication Kernel  
15. 分块处理中的边界条件处理-Handling Boundary Conditions in Tiling  
16. 任意矩阵维度的分块内核-A Tiled Kernel for Arbitrary Matrix Dimensions  
17. 性能考量——DRAM带宽-Performance Considerations - DRAM Bandwidth  
18. 性能考量——CUDA内存合并-Performance Considerations - Memory Coalescing in CUDA  
19. 并行计算模式——卷积-Parallel Computation Patterns - Convolution  
20. 并行计算模式——分块卷积-Parallel Computation Patterns - Tiled Convolution  
21. 并行计算模式——二维分块卷积内核-Parallel Computation Patterns - 2D Tiled Convolution Kernel  
22. 并行计算模式——分块卷积中的数据重用-Parallel Computation Patterns - Data Reuse in Tiled Convolution  
23. 并行计算模式——归约-Parallel Computation Patterns - Reduction  
24. 并行计算模式——基础归约内核-Parallel Computation Patterns - A Basic Reduction Kernel  
25. 并行计算模式——改进型归约内核-Parallel Computation Patterns - A Better Reduction Kernel  
26. 并行计算模式——扫描(前缀和)-Parallel Computation Patterns - Scan (Prefix Sum)  
27. 并行计算模式——低效扫描内核-Parallel Computation Patterns - A Work-Inefficient Scan Kernel  
28. 并行计算模式——高效并行扫描内核-Parallel Computation Patterns - A Work-Efficient Parallel Scan Kernel  
29. 并行计算模式——深入探讨并行扫描-Parallel Computation Patterns - More on Parallel Scan  
30. 并行计算模式——直方图生成-Parallel Computation Patterns - Histogramming  
31. 并行计算模式——原子操作-Parallel Computation Patterns - Atomic Operations  
32. 并行计算模式——CUDA中的原子操作-Parallel Computation Patterns - Atomic Operations in CUDA  
33. 并行计算模式——原子操作性能-Parallel Computation Patterns - Atomic Operations Performance  
34. 并行计算模式——私有化直方图内核-Parallel Computation Patterns - A Privatized Histogram Kernel  
35. 高效主机-设备数据传输——固定主机内存-Efficient Host-Device Data Transfer - Pinned Host Memory  
36. 高效主机-设备数据传输——CUDA中的任务并行-Efficient Host-Device Data Transfer - Task Parallelism in CUDA  
37. 高效主机-设备数据传输——数据传输与计算重叠-Efficient Host-Device Data Transfer - Overlapping Data Transfer with Computation  
38. 相关编程模型——OpenCL数据并行模型-Related Programming Models - OpenCL Data Parallelism Model  
39. 相关编程模型——OpenCL设备架构-Related Programming Models - OpenCL Device Architecture  
40. 相关编程模型——OpenCL主机代码(第一部分)-Related Programming Models - OpenCL Host Code Part 1  
41. 相关编程模型——OpenCL主机代码(续)-Related Programming Models - OpenCL Host Code (Cont.)  
42. 相关编程模型——OpenACC-Related Programming Models - OpenACC  
43. 相关编程模型——OpenACC详解-Related Programming Models - OpenACC Details  
44. 相关并行模型——C++ AMP-Related Parallel Models - C++ AMP  
45. 相关并行模型——C++ AMP高级概念-Related Parallel Models - C++ AMP Advance Concepts  
46. 相关并行模型——异构超级计算导论与M-Related Parallel Models - Introduction to Heterogeneous Supercomputing and M  
47. 结论与未来方向-Conclusions and Future Directions  
学习指南