Logo

www.clouddoll.com


CPU Technology (2)

10. . FSB. .


FSB CPU as well as the whole computer system base frequency, MHz (megahertz). In the early days of computers, memory and motherboard synchronous running speed is equal to the FSB, which can be understood as the CPU FSB link directly with memory, synchronization between the running state. For the current computer system, there might not be the same, but the significance of the FSB still exists in computer systems are most often on the basis of the FSB, multiplied by a certain ratio, the ratio is greater than 1 or less than 1.


Said processor FSB, it is necessary that the two closely related concepts: frequency doubling and frequency, frequency is CPU clock frequency; multiplier and FSB frequency that is a multiple of the ratio. GHz, FSB, multiplier, its relationship: frequency = FSB × multiplier. .


In 486 CPU's clock speed before, are still at a relatively low phase, frequency of CPU is generally equal to the FSB. In 486 appear later, due to the increasing CPU operating frequency, and PC and some other devices (such as cards, hard drives, etc.) are protected by technology limitations, cannot afford higher frequencies, thus limiting the frequency of the CPU. Thus doubling technology, which enables CPU internal operating frequency to an external frequency multiples, thus enhancing multiplier and the purpose of increasing clock speed. Frequency doubling technology is to enable external devices can operate at a lower FSB on the CPU FSB frequency is a multiple of. .


In the Pentium era, the CPU FSB is generally 60/66MHz, 350 from Pentium CPU Ⅱ, raised to the FSB, the current CPU FSB 100MHz have reached 200MHz. As a result of normal circumstances FSB and memory bus frequency is the same, so when the CPU FSB improve memory, and the exchange rate is increased accordingly, to improve the overall speed of computers will have significant impact.


FSB and the front-side bus (FSB) frequency can be easily confused. Front-side bus speed refers to the CPU and north bridge chip between the bus speed, more substantial that the CPU and external data transmission speed. The FSB's concept is based on digital pulse signal based on the shock speed, that is, 100MHz FSB referred specifically to digital pulse signal in the shock of a hundred million times per second, it is more of the PIC and other bus frequency.FSB front side bus and the fact that the two concepts of confusing, the main reason is that in the past for a long time (mainly occur in the Pentium 4 and Pentium just before 4 appears), and FSB front side bus frequency is the same, and therefore often called directly to the FSB front side bus, and ultimately cause such misunderstanding. As computer technology development, people need to find front side bus frequency is higher than the FSB, the QDR (Quad Date Rate) technology, or other similar technology to achieve this objective. The principle of these technologies like AGP to 2X or 4X, which makes the front-side bus frequency as FSB 2 times 4 times or more, Since then, the distinction between front-side bus and the FSB began to pay attention to it by people. .


11. Multiplier.


CPU multiplier, the full name is the multiplier factor. CPU core operating frequency and the FSB there is a ratio between the relationship of this ratio is the multiplier factor, referred to as multiplier. Harmonic theory, from 1. .5 All the way to unlimited, but to note that the multiplier is 0. .5 Unit as a unit. FSB is multiplied with the harmonic frequency, so any one can increase the frequency to increase CPU. .


Wasn't multiplier concept, CPU clock speed and system bus speed is the same, but the CPU's speed faster and faster, the multiplier technology also will answer. It makes the system bus to work at relatively low frequencies, and CPU speeds through the multiplier to increase indefinitely. Then the CPU frequency formula becomes: clock speed = FSB x multiplier. The double frequency refers to the CPU and system bus of multiples, FSB unchanged, multiplier, the CPU frequency.


12. . Production process. .


Usually we say we are the "CPU" means a production process is in the production process, CPU for processing various circuits and electronic components, manufacturing wire connection of individual components. Typically the accuracy of its production to micron (unit of length, equal to one thousandth of a millimetre 1 micron) to indicate that future Nano (1 nanometer is equal to one thousandth of a micron) development trend, higher precision, more advanced manufacturing process. In the same materials can create more electronic components, connectors, increasing the CPU's integrated CPU power consumption is also lower.


Micron manufacturing process is the IC circuit within the circuit and the distance between. Manufacturing technology trend is to higher density development in the direction. The higher density IC circuit design, means that in the same area the size of IC, you can have a higher density, more complex circuit design. Microelectronics technology development and progress, mainly through the continuous improvement of process technology, making the device feature size shrinking to continuously improve the integration, power consumption, device performance is improved. Chip manufacturing processes after 1995, from 0. .5 Micron, 0. .35 Micron, 0. .25 Micron, 0. .0.18-micron, and Micron .15 0. .13 Micron, and 0. .09 micron manufacturing process will be the next-generation CPU's development goals.


13. . 2 cache capacity. .


CPU cache (Cache Memoney) is located between the CPU and memory temporary memory, it is small but Exchange memory speed. The data in the cache is the memory of a small part of it, but this is a small portion of the CPU is within a short period of time, when the CPU access calls the large amounts of data, you can avoid the memory directly from the cache, thus speeding up the reading speed. Thus, in the CPU's cache is a highly efficient solution, so that the entire memory (cache + memory) becomes both a cache of high speed and memory of large-capacity storage systems. Impact on the CPU cache performance of large, mainly because the data exchange sequence of CPU and bandwidth between CPU and cache caused. . .


Cache works is when the CPU to read the data, first look up from the cache, if you find it now read and gave CPU processing; if not found, the relatively slow speed read from memory and CPU handling, while at the same time gave the data blocks of data into the cache, you can make later to block data read from the cache, it is not necessary to call again.


It is this mechanism to read the CPU reads the cache hit rate is very high (most of the CPU up to 90%), meaning that the next CPU to read 90% of the data in the cache, only about 10% need to read from memory. This greatly saves time CPU memory read directly, but also to the basic CPU reads the data without having to wait. In general, CPU reads the data after the order is the first cache memory. .


Most earlier CPU cache is a whole, and very low capacity, and Intel Corporation from the Pentium era started to cache categorized. Was integrated in the CPU core in the cache are no longer adequate for CPU requirements, and restrictions on the manufacturing process and does not significantly improve the capacity of the cache. Thus integrated in the same field with the CPU board or motherboard cache, in which case the CPU core integrated cache called the L1 cache external called L2 cache. A cache data cache also points (Data Cache, D-Cache) and instruction cache (Instruction Cache, I-Cache). Both are used to store data and instructions implementing these data, and both can also be CPU access, reduce contention caused by the conflict Cache to improve the processor performance. Intel Pentium 4 processor in the launch, also added a level trace cache, a capacity of 12KB. .


With CPU manufacturing process development, and L2 cache can be easily integrated in the CPU core, capacity has been improved. Now and then integrated in the CPU's internal or not to define a, L2 cache is not exact. And with the L2 cache is integrated into the CPU core, previous L2 cache and a big gap between the CPU frequency is changed, this time its the same clock speed to the speed of the CPU, you can provide a higher transfer speed.


CPU performance secondary cache is one of the key changes in the CPU core is not the case, increase the capacity of the secondary cache can significantly improve performance. The same high-low-end CPU core distinction is also often have differences in the secondary cache, we can see the importance of the secondary cache for the CPU. .


The CPU cache found useful data is called a cache hit, when the CPU is not in the required data (now known as misses), the CPU is memory access. Theoretically, in one of my own L2 cache CPU, read the L1 cache hit rate is 80%. I.e. CPU level found in the cache data useful data accounts for 80% of the total, the remaining 20% of the reading from the L2 cache. Because you cannot accurately predict the data that will be executed, reads the L2 cache hit ratio is 80% (from L2 cache read useful information in the total data of 16 per cent). Then there's data would have to call from memory, but this is a very small proportion. The current high end CPU, but also with three cache, it is to read two data cache misses after the design - the kind of cache, the CPU has a three-level cache, only about 5% of the data needs call from memory, which further improves CPU efficiency. .


In order to ensure that the CPU when you access a higher hit rate, the cache contents should be replaced by a certain algorithm. A more common method is the "least recently used algorithm" (LRU algorithm), it is the most recent period of time at least accessed line. So you need to set up a counter for each row, the LRU algorithm is a hit counter is cleared to zero rows, the other row counter by 1. When you want to replace is eliminated row counter values for the largest data rows out of the game. This is an efficient, scientific method, the counter process can be cleared after some frequently call out the unwanted data out of cache to improve cache utilization. .


CPU in the L1 cache size of 4 KB to 64KB between L2 cache capacity is divided into 128KB, 256KB, 512KB, 2MB 1MB, etc. L1 cache capacity between the different products, and L2 cache size is the key to increasing CPU performance. L2 cache capacity enhancement is the CPU processing capacity, will inevitably lead to increased CPU internal transistor count increases, limited to the CPU area integration more cache, on the manufacturing process requires the.


14. . The core voltage. .


CPU operating voltage (Voltage Supply) that the CPU work the desired voltage. Any electrical equipment at work requires electricity, naturally also has corresponding rated voltage, CPU is no exception. Current CPU working voltage has a very clear downward trend, low-voltage main three advantages:.


Low voltage CPU, reducing total chip power consumption. Reduce power consumption, operation of the system cost is reduced accordingly, which for portable and mobile systems is very important to the existing battery can work longer, thus greatly extending the battery life;. .


Power consumption is reduced, resulting in reduced caloric value, run the temperature but high CPU and system better;.


Lower voltage CPU frequency of the important factors. .


CPU voltage is divided into two areas, the CPU core voltage and I/O voltage. CPU core voltage-driven core chip voltage I/O voltage is driven I/O circuit voltage. Usually the CPU core voltage I/O voltage is less than or equal to.


Early CPU (286 ~ 486 times) core voltage and I / O line, usually 5V, because the manufacturing process was relatively backward, so that CPU too much heat, leading to shortened life expectancy. But integration is very low when the CPU, and the current CPU integrated high, so it is now more CPU heat.As the CPU of the manufacturing process, in recent years various CPU voltage are gradually declining trend, the current desktop CPU core voltage is usually used to 2V or less, the work of the special notebook CPU voltage relative lower, so as to achieve a substantial reduction in power consumption of the aimed to extend the battery life, and reduces CPU heat. But now the CPU through special voltage ID (VID) PIN to indicate that the motherboard in embedded voltage regulator automatically set the correct voltage level.


Many motherboards for the new CPU will provide a special jumper or software settings, these jumpers or software, you can manually adjust CPU according to the specific needs of the working voltage. Many experiments show that the moderate increase in the time overclocking the core voltage can enhance the CPU internal signal, the improvement of the CPU performance would be of great help - but it will also improve the CPU's power consumption and heat affect the life of the proposed general users Do not carry out the operation in this regard. .


15. Hyper-Threading technology.


CPU manufacturers to improve the performance of CPU, common practice is to improve the CPU's clock frequency and increasing cache capacity. However, the frequency of the current CPU faster and faster, if further increased by improving CPU frequency and cache methods to improve performance, often subject to restrictions on the manufacturing process and high cost constraints. .


Although the increase in CPU clock frequency and increase the cache size is indeed may improve performance, but this kind of CPU performance improvement in technology there is considerable difficulty. In fact in the application is based on a number of reasons, CPU execution units have not been fully used. If the CPU can't properly read data (bus/memory bottleneck), the execution unit utilization will be decreased significantly. The other is the majority of current execution thread lack ILP (Instruction-Level Parallelism, simultaneous execution of multiple directives) support. These all contribute to the current CPU performance has not been all play. Therefore, Intel is using the other ideas to improve the performance of CPU, so CPU can run multiple threads, they can play an even greater CPU efficiency for the so-called "hyper-threading (Hyper-Threading, referred to as" HT ")" technology.Hyper-Threading technology is the use of special hardware instructions, two logical kernel simulation into two physical chip, so that a single processor can use thread-level parallelism, thereby compatible with multi-threaded operating systems and software, reduces CPU idle time and improve the efficiency of CPU.


Hyper-Threading in a timely manner can be used at the same time, the application can use different parts of the chip. Although the single-threaded chip can handle hundreds of thousands of instructions per second, but at any one time can only operate on one instruction. The Hyper-Threading Technology enables multi-threaded processing chip at the same time, enhance the chip performance. .


Hyper-Threading technology is one of my CPU at the same time perform multiple programs to share one of my CPU resources, in theory, like two CPU at the same time as the execution of two threads, P4 processor requires more CPU to a Logical Pointer (logical processing unit). As a result of a new generation of P4 HT of the die area is P4 increased 5%. While the remaining part of the ALU (integer arithmetic unit), the FPU (floating point unit), the L2 Cache (L2 cache) is the part to be shared.


Although the use of Hyper-Threading Technology can execute two threads, but it does not like two real CPU, as per the CPU has independent resources. When two threads are also required when a particular resource, one have to stop and give up resources until these resources are idle before continuing. Therefore, the performance of Hyper-Threading CPU performance is not equal to 2. .


Intel P4 HT has two operation modes, Single Task Mode (single-task mode) and Multi Task Mode (multiple-task mode), when the program does not support Multi-Processing (multiple processor jobs), the system stops one logical CPU resources to focus on a single logical CPUs, a single-threaded programs from one logical CPU idle and reduce performance, but was stopped by the logical CPU or waiting for work to occupy a certain amount of CPU resources, so the Task running Single Hyper-Threading Mode program mode, it is possible to not function with Hyper-Threading CPU performance, but performance would not be too big a gap between . That is, when running single-threaded application software, Hyper-Threading Technology and even reduce the system performance, especially in multi-threaded operating systems running single-threaded software prone to this problem. .


It should be noted that, with Hyper-Threading technology requires a CPU chipset, software support, in order to better play the technology. Currently supports Hyper-Threading technology enabled chipset includes: Intel i845GE, PE and xitong iSR658 RDRAM, SiS645DX, SiS651 can directly support Hyper-Threading; Intel i845E i850E through a BIOS upgrade, can support via P4X400, P4X400A can support, but did not receive an official authorization. Operating systems such as: Microsoft Windows XP, Microsoft Windows 2003, Linux kernel 2. .4. . X later version also supports Hyper-Threading Technology. .


16. 3D Now 。.


Proposed by the AMD company 3DNow! SSE instruction set instruction set should be out before now, and is widely used in the AMD K6-2, K6-3 and Athlon (K7) processor. 3DNow! Instruction set technology is in fact 21 of the expansion of machine code instruction set. .


And Intel MMX technology company focused on integer arithmetic, 3DNow! instruction set is mainly used for 3D modeling, coordinate transformation, and the effect of rendering 3D applications, the combination of the software, you can significantly improve 3D performance. Later on in the Athlon has developed Enhanced 3DNow!. These standards of SIMD instructions AMD and Intel's SSE has the same effect. Because of the commercial as well as Intel Pentium III succeeds, the software supports the SSE, 3DNow! is more prevalent. Enhanced 3DNow! AMD company to continue to increase to 52 commands, including some SSE code, thus optimized for SSE software do get better performance. .


17. Multimedia instruction sets.


CPU to rely on instructions to calculate and control systems, each CPU has been designed with a series of regulations compatible with its hardware instruction set. CPU instruction is also an important indicator of the strength, the microprocessor instruction set to improve the efficiency of one of the most effective tool.From the current mainstream architecture, instruction set can be divided into complex instruction set and reduced instruction set, but from the concrete application, such as Intel's MMX (Multi Media Extended), SSE, SSE2 (Streaming-Single instruction multiple data-Extensions 2) and AMD 3DNow! are CPU extensions, enhanced CPU multimedia, graphics, images and the processing power of the Internet, etc. We usually extended CPU instruction set known as the "CPU's instruction set." .


1. the application of the reduced instruction set.


Invention of the computer in the first few decades, with the increasing computing power increases, the performance become increasingly strong, the internal components are more and more, increasingly complex instruction set, too miscellaneous instructions seriously affected the efficiency of the computer. Later, after research found that, in the computer, 80% of the procedures used only 20% of the instruction set, based on this discovery, RISC reduced instruction set was put out, the computer system architecture which is a profound revolution.RISC architecture of the basic idea is: seize the CISC instruction system instruction types too much, the directive does not regulate, addressing too many shortcomings, by reducing the type, specification, instruction instruction format and simplified addressing convenient processor internal parallel processing, improve efficiency in the use of VLSI devices, thereby significantly improving processor performance.


RISC instruction set has many features, the most important are:. .


Instruction type less, directive format specification: RISC instruction set is usually only one or a few formats. Instruction length single (typically 4 bytes), and aligned on word boundaries. Field locations, especially the location of the opcode is fixed.


Simplified addressing modes: Almost all commands use register addressing mode, addressing mode is generally not more than 5 total. Addressing other more complex methods, such as indirect addressing by the software using simple addressing modes to synthesize. .


Extensive use of registers: RISC instruction set most operations are register to register, just a simple Load and Store operations access memory. Therefore, each instruction memory access that address will not be more than 1, and memory access operations will not be confused with the arithmetic operation.


Simplified processor architecture: Using RISC instruction set, we can greatly simplify the processor controller and the design of other functional units, without the use of a large number of dedicated registers, in particular, allows hardware circuits to implement instruction operation, while eliminating the need for CISC processors used as micro-program to achieve operating instructions. Therefore, do not like the CISC processor RISC processor as set micro-program control memory, we can quickly direct the implementation of directives. .


VLSI technology easy to use: with LSI and VLSI technology, the processor (or even more than one processor) can be placed on a chip. RISC architecture design of single-chip processor can bring many benefits, improves performance, simplifies VLSI chip design and implementation. Based on VLSI technology, manufacture of RISC processors than CISC processor workload, cost is much lower.


Strengthened the capacity of parallel processors: RISC instruction set can be very effective for the use of line, super lines and superscalar techniques to achieve instruction-level parallel operations to improve processor performance. The commonly used technique is basically parallel operations inside the processor is based on RISC architecture, development and maturing. .


It is precisely because of the RISC architecture, which has advantages in high-end system has been widely used, and the CISC system is the desktop system dominates. But now, in the desktop area, constant osmotic RISC, expected future, will be a marketing stunt RISC.


2, CPU instruction set extensions. .


For the CPU, the basic functions, their differences are not too large, the basic instruction set is the same, but many manufacturers in order to enhance a particular aspect of performance the extensions, extensions to define a new data and instructions, to dramatically increase the ability of certain aspects of the data processing, but will need to have software support.


MMX instruction set. .


MMX (multimedia eXtension, Multi Media extensions) instruction set is the Intel Corporation in 1996 introduced a multimedia instruction enhancement technology. MMX instruction set including 57 multimedia instructions, these instructions can handle more data, the results exceeded the actual processing capacity to carry out the normal processing, which together with the software, you can get better performance. MMX's benefits is that the then existing operating systems and to make any modifications it can easily perform MMX program. However, the problem is obvious, that is MMX instruction set and the x87 floating point instructions can not be run concurrently, to do intensive cross switch can run properly, this situation will inevitably cause a decline in the quality of the whole system is running. .


SSE instruction set.


SSE (Streaming SIMD Extensions, single instruction multiple data extension) instruction set is the Intel Pentium III processor in the first launch. In fact, before the formal launch early in the PIII, Intel Corporation announced through various channels to have been the so-called KNI (Katmai New Instruction) instruction set, the SSE instruction set, instruction set that is the precursor, and a lot of the media once called MMX instruction set of the next version, which MMX2 instruction set.The background, the original "KNI" instruction set is Intel Corporation for its next-generation chip named instruction set name, and so-called "MMX2" is purely hardware the critics and the media with feelings and impressions on the evaluation of the KNI ", Intel Corporation has never been officially published the news about MMX2.


The final launch of the SSE instruction set to win the so-called "Internet SSE" instruction set. SSE instruction set includes 70 instructions, including 3D graphics computing to improve the efficiency of 50 SIMD (single instruction multiple data technology) floating point instructions, 12 enhanced MMX integer instructions, eight continuous optimization of the data memory block transfer instruction . These instructions on the current popular theory, image processing, floating-point, 3D computing, video processing, audio processing and many other intensive multimedia applications play a full role.S instructions and 3DNow! SE directive are incompatible with each other, but SSE contains most of the 3DNow! technology features, just a different way. SSE-compatible MMX instruction, it can use SIMD and single clock cycle parallel processing of multiple floating point data to effectively increase the speed of a floating-point operation.


Intel later in response to AMD's 3Dnow! + Instruction set, again based on the developed SSE SSE2, adding some instructions to make the P4 processor performance has improved greatly. To design the end of the P4, Intel added a package of 144 new instructions SSE2 instruction set. Like the first SIMD instruction set extensions, SSE2 data related to the target of multiple immediate implementation of a single instruction (ie, SIMD, a best way of calculating the low IPC is to perform more work per instruction).The most important thing is SSE2 can deal with 128-bit floating point double precision and mathematical operations. More precise floating-point SSE2 acceleration capability to become multimedia program, 3D processing engineering, as well as the type of task based workstation configuration. But it is important that the software is properly optimized utilization of it.


3D Now! Instruction set. .


Submitted by AMD 3DNow! instruction set, it should be said that appear in the SSE instruction set, and is widely used in the AMD K6-2, K6-3, as well as Athlon (K7) processor. 3DNow! instruction set technology is actually 21 machine code extensions.


Intel's MMX technology and focus on integer arithmetic is different, 3DNow! Instruction set mainly for the three-dimensional modeling, coordinate transformation and the effect of three-dimensional rendering and other applications, in cooperation with the software, can greatly improve the 3D processing performance. Later in the Athlon was developed Enhanced 3DNow!. The standard AMD and Intel's SSE SIMD instructions have the same performance. Because by the Intel Pentium III commercial success and the impact on the software than in support of SSE 3DNow! More common.AMD 3DNow! Enhanced continues to increase to 52 directive contains a number of SSE code, and thus for SSE optimise software can achieve better performance.