博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
What every programmer should know about memory (Part 1) 译
阅读量:4071 次
发布时间:2019-05-25

本文共 7347 字,大约阅读时间需要 24 分钟。

What Every Programmer Should Know About Memory
Ulrich Drepper
Red Hat, Inc.
drepper@redhat.com
November 21, 2007

Abstract

As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory access. Hardware designers have come up with ever more sophisticated memory handling and acceleration techniques–such as CPU caches–but these cannot work optimally without some help from the programmer. Unfortunately, neither the structure nor the cost of using the memory subsystem of a computer or the caches on CPUs is well understood by most programmers. This paper explains the structure of memory subsystems in use on modern commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

1 Introduction

In the early days computers were much simpler. The various components of a system, such as the CPU, memory, mass storage, and network interfaces, were developed together and, as a result, were quite balanced in their performance. For example, the memory and network interfaces were not (much) faster than the CPU at providing data.

This situation changed once the basic structure of computers stabilized and hardware developers concentrated on optimizing individual subsystems. Suddenly the performance of some components of the computer fell significantly behind and bottlenecks developed. This was especially true for mass storage and memory subsystems which, for cost reasons, improved more slowly relative to other components.

The slowness of mass storage has mostly been dealt with using software techniques: operating systems keep most often used (and most likely to be used) data in main memory, which can be accessed at a rate orders of magnitude faster than the hard disk. Cache storage was added to the storage devices themselves, which requires no changes in the operating system to increase performance. {

Changes are needed, however, to guarantee data integrity when using storage device caches.} For the purposes of this paper, we will not go into more details of software optimizations for the mass storage access.

Unlike storage subsystems, removing the main memory as a bottleneck has proven much more difficult and almost all solutions require changes to the hardware. Today these changes mainly come in the following forms:

  • RAM hardware design (speed and parallelism).
  • Memory controller designs.
  • CPU caches.
  • Direct memory access (DMA) for devices.

For the most part, this document will deal with CPU caches and some effects of memory controller design. In the process of exploring these topics, we will explore DMA and bring it into the larger picture. However, we will start with an overview of the design for today’s commodity hardware. This is a prerequisite to understanding the problems and the limitations of efficiently using memory subsystems. We will also learn about, in some detail, the different types of RAM and illustrate why these differences still exist.

This document is in no way all inclusive and final. It is limited to commodity hardware and further limited to a subset of that hardware. Also, many topics will be discussed in just enough detail for the goals of this paper. For such topics, readers are recommended to find more detailed documentation.

When it comes to operating-system-specific details and solutions, the text exclusively describes Linux. At no time will it contain any information about other OSes. The author has no interest in discussing the implications for other OSes. If the reader thinks s/he has to use a different OS they have to go to their vendors and demand they write documents similar to this one.

One last comment before the start. The text contains a number of occurrences of the term “usually” and other, similar qualifiers. The technology discussed here exists in many, many variations in the real world and this paper only addresses the most common, mainstream versions. It is rare that absolute statements can be made about this technology, thus the qualifiers.

1.1 Document Structure

This document is mostly for software developers. It does not go into enough technical details of the hardware to be useful for hardware-oriented readers. But before we can go into the practical information for developers a lot of groundwork must be laid.

To that end, the second section describes random-access memory (RAM) in technical detail. This section’s content is nice to know but not absolutely critical to be able to understand the later sections. Appropriate back references to the section are added in places where the content is required so that the anxious reader could skip most of this section at first.

The third section goes into a lot of details of CPU cache behavior. Graphs have been used to keep the text from being as dry as it would otherwise be. This content is essential for an understanding of the rest of the document. Section 4 describes briefly how virtual memory is implemented. This is also required groundwork for the rest.

Section 5 goes into a lot of detail about Non Uniform Memory Access (NUMA) systems.

Section 6 is the central section of this paper. It brings together all the previous sections’ information and gives programmers advice on how to write code which performs well in the various situations. The very impatient reader could start with this section and, if necessary, go back to the earlier sections to freshen up the knowledge of the underlying technology.

Section 7 introduces tools which can help the programmer do a better job. Even with a complete understanding of the technology it is far from obvious where in a non-trivial software project the problems are. Some tools are necessary.

In section 8 we finally give an outlook of technology which can be expected in the near future or which might just simply be good to have.


简介:

因为cpu core变得更快和越来越多,现在更多时候程序运行的限制因素是内存存取.硬件设计师已经提出了很多复杂的内存处理和类似cpu缓冲加速技术,但是如果在没有程序员的帮助下,这些技术仍然不能最佳的工作.不幸的,对于很多的程序员来说,他们不能深入的理解架构/电脑内存子系统/cpu缓存的消耗, 这篇文章阐述了内存子系统的架构在现代商用硬件中的使用,说明了为什么cpu缓存技术会发展,他们是如何工作的,程序应该做什么才能利用他们实现最佳的性能.

介绍:

在早期计算机是十分的简单.像类似CPU,内存,大容量存储器,网络接口,各种计算机组件它们是一起发展的,因此,拥有相对均衡的性能表现.比如:内存和网络接口并不比cpu提供数据快.

但是这种计算机稳定的架构开始开始改变,硬件提供商集中优化单独的子系统.计算机的部分组件性能突然开始落后并且阻碍了计算机的发展,对于大容量存储器和内存,由于成本的缘故,相对与其它组件是发展的较慢.

大容量存储器速度慢通过软件技术已经被很好的改善,操作系统把经常使用的数据在内存中,这些数据的存取速度会比从硬盘快几个量级.将缓冲加入存储设备本身(缓冲将导致数据的不一致行,我们将如何处理脏数据?),这使得在不修改操作系统本身的前提下来提高性能.在这里我们将不会进行深入的了解关于大容量存储器的软件优化.

不像存储子系统,内存瓶颈已经被证明更加的困难并且几乎所有的方法都要求硬件的改变.

今日这些改变主要是以下的方式:

RAM 硬件设计(速度和并发)
内存控制器设计
CPU缓冲
DMA(直接内存访问,绕过中央处理器)

这片文章更多的是关于CPU缓冲和内存控制器设计.在探索这些主题的过程中,我们将探索DMA并且将其带入更大的背景.然而,我们将从现代商用硬件的设计谈起.这是理解有效使用内存子系统时带来的问题和限制的先决条件.我们将认识到RAM的不同类型和阐明为什么这些不同仍然存在.

这篇文章实在没有办法包含所有的内容,只限制于商用硬件中的一小部分.一些主题也只是点到为止的讨论以达到本文目的,读者也可以阅读其他的文档获取细节.

文章关于操作系统特地的细节和解决方法都是针对Linux的,无论何时都不会包含其他系统的信息.作者没有兴趣去讨论其他系统.如果读者认为他们必须去使用一个不同的系统,那么他们必须要求他们的供应商去写相似的文档.

在开始前的最后说明,这里讨论的技术在现实中有很多不同的实现,但是本文只是阐述最流行的技术解决方案版本.

1.1文档结构

这篇文档主要针对软件开发者.它并没有提供足够的硬件细节,因此可能不是十分的有用对于硬件方向的读者.但是在讨论实际细节之前,我们应该了解足够多的背景知识.

为了实现这个目的,第二节我们将描述RAM技术细节.这部分的知识是很容易理解的,但是不是必须的去理解后面的内容.我们在之后会引用这章以防心急的读者可以起初跳过这一节.

第三节 关于Cpu缓冲行为的许多细节,使用图形以防文章太枯燥.这部分内容对于文章的其余部分是十分的重要.

第四节 简短的描述了虚拟内存的实现.这也是背景知识之一.

第五节 提到了NUMA系统的细节

第六节 文章的中心,这里汇集了之前章节的信息并且给了程序员一些意见关于如何写出在不同情况下鲁棒性更好的代码.非常心急的读者可以从本章开始阅读.必要的时候去回顾一下基础知识.

第七节 介绍了一些能够帮助程序员去更好完成工作的工具.

第八节 展望在不久的未来我们期望出现的或者好用的技术.

转载地址:http://zbgji.baihongyu.com/

你可能感兴趣的文章
linux怎么切换到root里面?
查看>>
linux串口操作及设置详解
查看>>
安装alien,DEB与RPM互换
查看>>
linux系统下怎么安装.deb文件?
查看>>
编译Android4.0源码时常见错误及解决办法
查看>>
Android 源码编译make的错误处理
查看>>
linux环境下C语言中sleep的问题
查看>>
ubuntu 12.04 安装 GMA3650驱动
查看>>
新版本的linux如何生成xorg.conf
查看>>
xorg.conf的编写
查看>>
启用SELinux时遇到的问题
查看>>
virbr0 虚拟网卡卸载方法
查看>>
No devices detected. Fatal server error: no screens found
查看>>
新版本的linux如何生成xorg.conf
查看>>
virbr0 虚拟网卡卸载方法
查看>>
Centos 6.0_x86-64 终于成功安装官方显卡驱动
查看>>
Linux基础教程:CentOS卸载KDE桌面
查看>>
db sql montior
查看>>
read humor_campus
查看>>
IBM WebSphere Commerce Analyzer
查看>>