What is CPU and its function?

What is CPU and its function?

What is CPU and its function?

central processing unit 

An intel core i7 CPU installed on the motherboard

AMD Phenom Quad-Core Structural Photography

The central processing unit (English: Central Processing U nit , abbreviation: CPU ) is one of the main devices of the computer . Its main function is to interpret computer instructions and process data in computer software . Before the 1970s, the central processing unit was composed of multiple independent units. Later, a central processing unit made of integrated circuits was developed . These highly contracted components are the so-called microprocessors . The most complex circuit of the central processing unit can be Made into a single tiny powerful unit, the so-called core.

Central processing unit broadly refers to a series of logic machines that can execute complex computer programs. This loose definition easily includes early computers before the name “CPU” came into common use. Regardless, the name and its abbreviation have been in widespread use in the electronic computing industry since at least the early 1960s. Although the “central processing unit” has greatly developed in terms of physical form, design and manufacture, and execution of specific tasks compared with the early days, its basic operating principle has not changed.

Early CPUs were usually custom-built for large and application-specific computers. But this expensive approach of customizing CPUs for specific applications has largely given way to the development of inexpensive, standardized classes of processors suitable for one or more purposes. This trend toward standardization began with the days of mainframes and microcomputers , which consisted of a single transistor , and accelerated with the advent of integrated circuits . ICs allow more complex central processing units to be designed and manufactured in very small spaces (on the order of microns ). The standardization and miniaturization of central processing units have made this type of electronic component more and more popular in modern life. Modern processors are found in everything from cars and cell phones to children’s toys.

history 

EDVAC , the first electronically stored programmable computer.

Before the advent of today’s CPUs, computers like ENIAC had to go through some circuit adjustments to start when executing different programs. Because their wiring had to be rewired to run different programs, these machines were often called “fixed-program computers.” And since the word CPU refers to a device that executes software ( computer programs ), those devices that first appeared with stored-program computers can also be called CPUs.

The idea of ​​a stored-program computer had already been incorporated into the ENIAC design, but was ultimately omitted in favor of an early completion. On June 30, 1945, before ENIAC was completed, the famous mathematician von Neumann published a paper entitled ” Draft Report on EDVAC “. It revealed that plans for a stored-program computer would finally be completed in August 1949. [1] The goal of EDVAC is to execute a certain number and kind of instructions (or operations) that combine to produce a useful program that EDVAC can execute. In particular, the programs written for EDVAC are stored in high-speed computer memory , rather than composed of physical circuits. This design overcomes some of the limitations of the ENIAC—that is, it takes a lot of time and effort to rewire to implement a new program. Under von Neumann’s design, EDVAC can simply replace the program (software) it executes by changing the content stored in the memory [Note 1] .

It is worth noting that although von Neumann was most notable for his contribution to the development of the stored-program computer due to his design of the EDVAC, other researchers who predate him such as Konard Zuse also proposed had similar thoughts. In addition, the Mark 1 , which was completed earlier than EDVAC , was built using the Harvard architecture , and also used punched paper tape instead of electronic memory as the concept of storing programs. The main difference between the von Neumann architecture and the Harvard architecture is that the latter stores and processes CPU instructions and data separately, while the former uses the same memory location. Most modern CPUs are designed according to the von Neumann architecture, but the Harvard architecture is also common.

As digital devices, all CPUs deal with discrete states and thus require some basic components to transition and differentiate between these states. Relays and vacuum tubes were commonly used for these applications before the market accepted transistors . While these materials are far superior in speed to purely mechanical construction, they suffer from a number of unreliability. For example, building DC sequential logic circuits with relays requires additional hardware to deal with contact bounce . Vacuum tubes do not suffer from contact bounce, but they must be warmed up before being used, and they must also be stopped at the same time [Note 2] . Usually when a vacuum tube fails, the CPU has to find the damaged component and replace it with a new tube. Therefore, early electronic vacuum tube computers were faster than electronic relay computers, but they were inconvenient to maintain. Vacuum tube computers like the EDVAC failed every eight hours, while the slower, earlier Mark 1 failed less often. [2] But in the end, vacuum tube computers dominated the computing world at the time due to their speed advantage, even though they required more maintenance and care. Most early synchronous CPUs had lower clock frequencies than modern microelectronic designs (see discussion of clock frequencies below). The common clock frequency at that time was 100,000 Hz to 4,000,000 Hz, which was greatly limited by the speed of the built-in switching device .

Discrete transistor and integrated circuit central processing units 

CPU, magnetic core memory and MSI PDP-8 /I bus interface.

Designing CPUs has become increasingly complex as many technology manufacturers invest in smaller and more reliable electronic devices. The advent of the transistor is the first qualitative leap of the CPU. Transistor CPUs of the 1950s and 60s were no longer built with bulky, unreliable, and fragile switching elements such as relays and vacuum tubes . With this improvement, a more complex and reliable CPU is built on one or more printed circuit boards containing discrete (discrete) components, thereby developing in the direction of small size, reliability and invulnerability.

The method of placing many transistors in a crowded space gained popularity during this period. An integrated circuit (IC) packs a large number of transistors onto a small piece of semiconductor , or chip . Initially only very basic, non-specific purpose digital circuits were miniaturized onto ICs (such as NOR logic gates ). CPUs based on these prepackaged ICs are called small-scale integrated (SSI) devices. SSI ICs, such as those used in the Apollo Guidance Computer , typically contain dozens of transistors. Building an entire CPU out of SSI ICs requires thousands of individual chips, but still saves a lot of space and power compared to previous discrete transistor designs. Due to advances in microelectronics technology, the number of transistors on an IC is increasing, thereby reducing the number of individual ICs required to build a complete CPU. “Medium-scale integration” (MSI) and “large-scale integration ” (LSI) increase the number of transistors included to hundreds or thousands.

In 1964, IBM introduced the System/360 computer architecture, which allowed a series of IBM computers with different speeds and performances to run the same program. This was indeed a feat, as computers at the time were mostly incompatible with each other, even from the same manufacturer. To practice this pioneering work, IBM came up with the concept of microprogramming , which is still widely used in modern CPUs. [3] The System/360 architecture was so successful that it dominated mainframe computers for decades and left behind a line of modern mainframe products called the IBM zSeries that used a similar architecture . In the same year (1964), Di Guido (DEC) launched another influential computer aimed at the scientific and research market, called the PDP-8 . DEC later launched the very famous PDP-11 . This product was originally planned to be composed of SSI IC, but it was changed to LSI IC after the LSI technology matured. Compared with the previous SSI and MSI ancestors, the first LSI product of PDP-11 included a CPU using only 4 LSI ICs. [4]

Transistor computers have many advantages that previous generations did not. In addition to reliability and low power consumption, the CPU is also faster because the state transition time of transistors is much shorter than that of relays and vacuum tubes. Thanks to the improvement of reliability and the shortening of the switching time of transistor converters, the clock frequency of the CPU reached tens of megahertz during this period. In addition, due to the increase in the use of discrete transistors and IC CPUs, new high-performance designs such as SIMD (Single Instruction Multiple Data) and vector processors began to appear. These early experimental designs spurred the rise of later supercomputers such as Cray .

How the central processing unit operates 

The main principle of operation of the CPU, regardless of its appearance, is to execute a series of instructions stored in what is called a program. Discussed here are devices designed following the general von Neumann architecture. Programs are stored in computer memory as a series of numbers . Almost all von Neumann CPUs operate in four phases: fetch , decode , execute , and write back .

The first stage, Fetch , retrieves the instruction (as a value or series of values) from program memory . The location of the program memory is specified by the program counter , which holds a value for identifying the current program location. In other words, the program counter keeps track of where the CPU is in the current program. After fetching the instruction, the PC increases the memory unit according to the length of the instruction [Note 3] . Instruction fetches often have to be sought from relatively slow memory, causing the CPU to wait for instructions to be fed. This problem is mainly addressed in the caching and pipelined architectures of modern processors (see below).

The CPU determines its execution behavior based on the instructions fetched from memory. In the decode stage, instructions are broken down into meaningful pieces. Interpret values ​​into instructions according to the CPU’s Instruction Set Architecture ( ISA ) definition [Note 4] . A portion of the instruction values ​​are opcodes that indicate which operations are to be performed. Other values ​​usually provide the necessary information for the instruction, such as the target of an addition operation. The target of such an operation may provide a constant value (that is, an immediate value), or an addressable value of a space: a scratchpad or a memory address , depending on the addressing mode . In older designs, the instruction decoding portion of the CPU was an unchangeable hardware device. However, in many abstract and complex CPUs and ISAs, a microprogram is often used to help convert instructions into various forms of signals. These microprograms can often be rewritten in finished CPUs to facilitate changes in decoding instructions.

After the fetch and decode phase, the execute phase follows . In this stage, various CPU components capable of performing the required calculations are connected. For example, if an addition operation is required, the ALU will be connected to a set of inputs and a set of outputs. The input provides the values ​​to be added, and the output will contain the summed result. The ALU contains circuitry to perform simple common operations and logic operations (such as addition and bit operations ) on the output. If the addition operation produces a result that is too large for the CPU to process, the overflow flag may be set in the flags register (see discussion of numerical precision below).

The final stage, writeback , simply writes back the results of the execution stage in a certain format. Operational results are often written into the scratchpad inside the CPU for quick access by subsequent instructions. In other cases, the results of calculations may be written to slower, larger and less expensive main memory . Certain types of instructions manipulate the program counter without directly producing result data. These are generally called “jumps” and bring about looping behavior, conditional execution (via conditional jumps), and functions in programs [Note 5] . Many instructions also change the status bits of the flags register. These flags can be used to influence the behavior of the program, since they often display the results of various operations. For example, use a “comparison” command to judge the size of two values, and set a value in the flag register according to the comparison result. This flag can be used to determine the program movement by subsequent jump instructions.

After the instruction is executed and the result data is written back, the value of the program counter is incremented, and the whole process is repeated, and the next instruction cycle normally fetches the next sequential instruction. If the completion was a jump instruction, the program counter will be updated to the address of the instruction that was jumped to, and program execution continues normally. Many complex CPUs can fetch multiple instructions at once, decode them, and execute them simultaneously. This section generally deals with ” classic RISC pipelines “, those that are actually rapidly gaining popularity in many electronic devices using simple CPUs (often called microcontrollers ) [Note 6] .

Design and implementation 

Integer range 

The way the CPU numbers are represented is a design choice that affects how the device works. Some early digital computers used electrical models internally to represent numbers in the common decimal (base 10) numbering system . There are also some rare computers that use ternary to represent numbers. Almost all modern CPUs use the binary system to represent numbers, so that numbers can be represented by physical quantities with two values, such as high and low levels [Note 7] and so on.

MOS 6502 microprocessor, dual in-line package format, a very popular 8-bit chip.

Related to number representation is the size and precision of the number that a CPU can represent. In the case of a binary CPU, a bit (bit) refers to a meaningful bit in the number processed by the CPU. The bit used by the CPU to represent the number Quantities are often referred to as ” word size “, “bit width”, “datapath width”, or “integer precision” when strictly referring to integers (as opposed to floating point numbers ), depending on the system varies from architecture to architecture, and often in different parts of the exact same CPU. For example: an 8-bit CPU can handle numbers in the range of eight binary numbers (each number has two possible values, 0 or 1), that is, 28 or 256 discrete values. In effect, integer precision places a hardware limit on the range of integer values ​​that can be exploited by software executable by the CPU. [Note 8]

Integer precision can also affect the amount of memory the CPU can address (address). For example, if a binary CPU uses 32 bits to represent memory addresses, and each memory address represents an octet, the addressable capacity of the CPU is 232 bytes or 4 GB . The above is a brief description of the CPU address space . Usually, the actual CPU design uses more complex addressing methods, such as paging technology to address more memory with the same integer precision.

Higher integer precision requires more wires to support more digits, and is therefore more complex, larger, more power-intensive, and generally more expensive. So although there are many higher precision CPUs (such as 16, 32, 64 and even 128-bit) on the market, it is still possible to see application software executing on 4- or 8-bit microcontrollers . Simpler microcontrollers are usually less expensive, use less power, and therefore generate less heat. These are the main considerations in designing electronic devices. However, on professional-grade applications, the benefit of the extra precision (mostly the extra address space) often significantly affects their design. In order to obtain the advantages of high and low bit widths at the same time, many CPUs design various parts into different bit widths according to different functions. For example, the IBM System/370 uses an original 32-bit CPU, but it uses 128-bit precision in its floating-point unit to obtain better precision and representation range of floating-point numbers. [3] Many later CPU designs used similar mixed bit widths, especially when the processor was designed for general purpose and thus required reasonable integer and floating-point arithmetic capabilities.

Clock frequency 

Logic analyzers display time and state in a synchronized data system.

Main frequency = FSB × multiplier.

Most CPUs, and even most sequential logic devices, are synchronous in nature . [Note 9] That is to say, they are designed and used on the assumption that they all work in the same synchronization signal. This signal, known as the clock signal , usually consists of a periodic square wave . Designers can choose an appropriate period for the clock signal by calculating the maximum time it takes for the electrical signal to cycle through the branches of the CPU’s many different circuits .

The period must be longer than the time it takes for the signal to move or propagate with maximum delay. It is possible to design an entire CPU to move data around the rising and falling edges of the clock signal. Both in terms of design and component dimensions, there are significant advantages to simplifying the CPU. At the same time, it also has the disadvantage that the CPU must wait to respond to slower components. This limitation has been substantially compensated by various methods of increasing CPU parallelism . (see below)

In any case, structural improvements cannot solve all the ills of synchronous CPUs. For example, clock signals are easily affected by other electronic signals. In increasingly complex CPUs, higher and higher clock frequencies make it more difficult to synchronize with the clock signal of the entire unit . Therefore, modern CPUs tend to develop multiple identical clock signals to avoid the delay of a single signal from causing the entire CPU to fail. Another major issue is that an increase in the clock signal also increases the heat generated by the CPU. The constantly changing clock frequency makes many components switch (Switch) regardless of whether they are in operation or not. In general, a component that is switching consumes more energy than a component that is at rest . Therefore, the increase in clock frequency makes the CPU need a more efficient cooling scheme.

One of the ways to deal with switching unnecessary components is called clock gating , which is to turn off the clock frequency to unnecessary components (effectively disable components). However, this method is considered too difficult to implement without seeing its low energy consumption and versatility. [Note 10] Another approach to the global clock signal is to remove the clock signal at the same time. Asynchronous (or clockless) designs make them more advantageous in terms of energy consumption and heat generation, while removing the full clock signal; making the design process more complex. Rarely, all CPUs are built without utilizing a global clock signal. Two notable examples are the ARM (“Advanced RISC Machine”) compliant AMULET and the MIPS R3000 compliant MiniMIPS. Rather than completely removing the clock signal , some CPU designs allow a certain percentage of devices to be asynchronous, such as using an asynchronous ALU in conjunction with a superscalar pipeline to achieve some arithmetic performance gains. An asynchronous design allows it to perform fewer math operations than a synchronous counter without completely removing the clock signal . Therefore, the combination of the asynchronous design’s excellent power consumption and heat generation rate makes it more suitable for operation on embedded computers . [5]

parallel 

The operation process of the low-scalar CPU is shown. Note that it takes 15 cycles to complete the three instructions.

The CPU structure described above can only execute one instruction at a time . This type of CPU is called low scalar .

This type of CPU has a big disadvantage: low efficiency. Since only one instruction can be executed, such processes are inherently inefficient for low-scalar CPUs. Since only one instruction can be executed at a time, the CPU must wait until the previous instruction is completed before continuing. This causes the subscalar CPU to stall on instructions that require more than two clock cycles to complete. Even adding a second execution unit (see below) doesn’t improve performance much; in addition to the single-lane latency, the dual-lane latency and the number of unused transistors also increases. Such a design makes it possible to run one instruction at a time and possibly achieve scalar performance (one instruction requires one clock cycle) regardless of the resources available to the CPU . However, most performance is subscalar (one instruction takes more than one clock cycle).

In order to achieve scalar goals and better performance, there are more and more designs that make CPUs inclined to parallel computing . When it comes to CPU parallelism, two terms are commonly used to differentiate these design techniques. Instruction parallel processing ( Instruction Level Parallelism , ILP) to increase the rate at which the CPU executes instructions (in other words, to increase the utilization of on-die execution resources), and thread parallel processing ( T hread Level Parallelism , TLP) The purpose is to increase execution threads (effectively individual programs) so that the CPU can execute them simultaneously. Each method can be distinguished by how embedded or relatively efficient (efficiency against the CPU). [Note 11]

Instruction level parallelism (ILP): instruction pipeline and superscalar architecture

Schematic diagram of the basic pipeline structure. Assuming that in the best case, this kind of pipeline can make the CPU maintain scalar performance.

One way to achieve increased parallelism is to fetch and decode instructions before the main instructions have finished executing. This simplest technique, we call instruction pipelining , is used in most of the modern general-purpose CPUs. By breaking down the execution pipeline into discrete stages, instruction pipelining allows more than two instructions to execute simultaneously. In contrast to the obsolete combined pipeline, instruction pipelining no longer uses the technique of waiting for an instruction to fully exit the pipeline before executing the next instruction.

Pipelining of instructions creates the possibility that the next job requires the previous job to complete. Such situations are often referred to as dependency conflicts. The solution is to add extra attention to such cases and to defer some instructions when a dependency conflict occurs. Naturally, this solution requires additional cycles, so instruction-pipelined processors are more complex than low-scalar processors. (Although not significantly) The performance of an instruction-pipelined processor can be very close to scalar, simply by disabling pipeline deferrals. (instructions that require more than one loop at a stage)

Simple superscalar pipeline. By fetching and dispatching two instructions simultaneously, up to two instructions can be completed in one clock cycle.

In addition, improvements to instruction pipelining have inspired techniques to reduce CPU component idle time. A design called superscalar includes a long instruction pipeline with multiple identical execution units. The dispatcher of the upper scalar pipeline reads and passes several instructions at the same time; the dispatcher determines whether the instructions can be executed in parallel (simultaneous execution) and distributed to the executable execution units. Roughly speaking, the more instructions a superscalar CPU can simultaneously dispatch to idle execution units, the more instructions it can complete.

The hardest part of designing a superscalar CPU architecture is creating an efficient allocator. The dispatcher must be able to quickly and correctly determine whether instructions can be executed in parallel and minimize idle execution units. It requires instruction pipelining to always fill the instruction stream, and boosts a certain amount of CPU cache in superscalar structures . It also gave rise to hazard avoidance techniques such as branch prediction , speculative execution , and out-of-order execution to maintain high levels of performance. By trying to predict which branch (path) a particular instruction will take, the CPU can minimize the number of times the entire instruction pipeline waits for a particular instruction to complete. Speculative execution provides a modest performance boost by executing part of an instruction to see if it is still needed after the entire operation is complete. Cross-order execution is to rearrange the commands executed by instructions to reduce data dependencies.

When not all CPU components have superscalar performance, the performance of components that do not have superscalar performance will be reduced due to sequencing delays. The Pentium prototype had two superscalar ALUs that could receive one instruction per clock cycle, but its Floating Point Unit ( FPU ) could not receive one instruction per clock cycle. Therefore, the performance of P5 can only be regarded as integer superscalar rather than floating point superscalar. The next-generation P6 of the Intel Pentium structure has added the superscalar capability of the floating-point processor, so there is a significant performance improvement in floating-point instructions.

Both of these simple pipeline and superscalar designs can improve the performance of instruction pipelines by allowing a single processor to complete an instruction in one clock cycle [Note 12] . Most modern CPU designs are at least superscalar, and almost all general-purpose CPUs of the decade are superscalar. In recent years, some computers that value high instruction pipelines have moved it from the hardware of the CPU to software. The ultra-long instruction character (strategy makes part of the instruction pipeline into software, reduces the workload of the CPU to promote the instruction pipeline, and reduces the design complexity of the CPU.

Thread-level parallelism (TLP): Simultaneous thread execution or thread-level parallel processing

Another strategy commonly used to increase CPU performance: Parallel computing is to enable the CPU to execute multiple execution threads (programs) at the same time. Roughly speaking, high simultaneous thread parallel execution (TLP) CPUs are more useful than high instruction parallel execution. Many parallel threads of simultaneous execution, pioneered by Cray Corporation in the 1970s and late 1980s, have inspired enormous computational efficiencies dedicated to this approach (In terms of time) In fact, TLP multithreading has been used since 1950 (Smotherman 2005) . In uniprocessor design, the two main design approaches to implement TLP are chip-level multiprocessing (CMP) chip-level multithreading and simultaneous multithreading (SMT). Multi-threaded processing at the same level. At a higher level, a computer has multiple individual processors, often organized using symmetric multiprocessing (SMP) and non-uniform memory access (NUMA). [Note 13] These very different methods all aim to achieve the same goal, which is to increase the CPU’s ability to process multiple threads at the same time.

The two methods of CMP and SMP are actually very similar and are the most direct methods. There’s some conceptual stuff here about how to implement two or more completely separate CPUs. In CMP, multiple processor cores are placed in the same package, sometimes in very close proximity to an integrated circuit . [Note 14] On the other hand, SMP contains multiple packages. NUMA is very similar to SMP, but NUMA uses non-single memory access. These are very important for a computer with multiple CPUs, because the time for each processor to access memory will be quickly consumed by the modules shared by SMP, which will cause serious delays, because the CPU needs to Wait for available memory. At this time NUMA is a good choice, it can allow multiple CPUs to exist in a computer at the same time and SMP can also be implemented at the same time. SMT has some differences, that is, SMT will reduce the distribution of CPU processing power as much as possible. The implementation of TLP is actually somewhat similar to the implementation of superscalar architecture. In fact, it is often used in superscalar architecture processors, such as IBM’s POWER5 . Rather than copying the entire CPU, SMT copies the parts needed to fetch instructions, encrypt and allocate them, just like normal registers in a computer. So this will make the SMT CPU keep the processing unit running continuously, some usually provide multiple instructions to the processing unit and come from different software threads, which is very similar to the ILP structure. Instead of processing multiple instructions from the same thread, it will process multiple instructions from different threads at the same time.

Data parallelism 

The processors mentioned above are constant instruments [Note 15] , while the CPU for vector processing is a less common type, but its importance is increasing. In fact, vector processing is very common in computer computing. As the name implies, a vector processor can process multiple data in one instruction cycle (one instruction), which is different from a constant processor that can only process a single data in one instruction cycle. These two different methods of processing data are generally referred to as “single instruction, multiple data” ( SIMD ) and “single instruction, single data” ( SISD ). The biggest advantage of the vector processor is that it can optimize different tasks in the same command cycle, for example: finding the sum of a large amount of data and the product of vectors, more typical examples are multimedia applications ( portraits , images , and sound) and many different types of scientific and engineering work. While the constant processor can only completely execute the four stages of fetching, decoding, executing, and writing back in a single command cycle for a set of data, the vector processor can already execute the same command for larger data in the same time. action. Of course, this assumes that the application makes multiple demands on the processor within a single command cycle.

Most early vector processors, such as the Cray-1 , were mostly used only for applications related to scientific research and cryptography. However, with the transfer of multimedia to digital media, the demand for general-purpose processors that can achieve “single instruction, multiple data” has greatly increased. Therefore, shortly after the popularization of floating-point calculators , general-purpose processors with the function of “single instruction, multiple data” came out. Some early “single instruction, multiple data” specifications, such as Intel’s MMX , can only perform integer operations. Since most applications that require “single instruction, multiple data” deal with floating-point numbers, this specification is undoubtedly a major obstacle for software developers. Fortunately, these early designs have been slowly improved and redesigned into the current common “single instruction, multiple data” new specifications, AMD has also introduced the first real implementation of floating-point SIMD instruction set 3DNow!, in every Four single-precision floating-point results can be obtained in one clock cycle, which is four times that of the general x87 floating-point processor at that time. New specifications are usually associated with an ISA. Some notable examples in recent years must be Intel’s SSE and the PowerPC-related AltiVec (aka VMX). [Note 16]

AMD Opteron six-core processor

Multi-core 

A multi-core CPU is a CPU chip or package that contains multiple processor cores. An even number of cores is more common and generally shares a secondary cache . Personal computers using dual-core and quad-core or higher processors are quite common nowadays.

The first dual-core processor was the IBM POWER4 processor. In 2012, IBM released the latest 8-core POWER 7+ processor with 80MB L3 cache/chip.

Performance 

The performance and speed of the CPU depends on the clock frequency (usually calculated in hertz or billion hertz, that is, hz and Ghz) and the instructions that can be processed per cycle (IPC), and the combination of the two is the instruction that can be processed per second (IPS). [6] The IPS value represents the “peak” execution rate of the CPU under several artificial instruction sequences, instructions, and applications. In reality, mixed instructions and applications composed of CPUs may take longer to complete than indicated by the IPS value. The performance of the memory hierarchy also greatly affects the performance of the CPU. Usually engineers use various standardized tests to test the performance of the CPU, and the standardized tests are usually called “benchmarks”. Like SPECint , this software attempts to simulate real-world environments. Measure various commonly used applications to try to get real CPU performance.

To improve the processing performance of the computer, multi-core processors are also used. The principle is basically an integrated circuit plugged into two or more individual processors (in the sense called cores) [7] . In an ideal world, a dual-core processor would double the performance of a single-core processor. However, in reality, due to imperfect software algorithms, the performance gain of multi-core processors is far lower than the theory, and the gain is only about 50%. But processors that increase the number of cores still increase the amount of work a computer can handle. This means that the processor can handle a large number of asynchronous instructions and events, which can share the overwhelmed work of the first core. Sometimes, the second core will work on the same task as the adjacent core at the same time to prevent crashes.

Leave a Reply

Your email address will not be published. Required fields are marked *