Company Intel continues to follow the development strategies Tick-Tock, adopted in 2006. According to this strategy, the development cycle is divided into two stages, “tick” and “so”: in one year (at the expense of “tick”), processors are transferred to new production standards, and a year later (on “so”) release new architecture.
This year, Intel has done the clock: on CES 2011 the company introduced processors Sandy Bridge. The manufacturer spoke in detail about the new architecture, boasted of achievements and announced three crystal lines at once. Now we will deal with them and test them – but first let us recall in what vein Intel developed the processors for the last couple of years.
Yesterday
In 2008, the company introduced processed Core 2 – Core i7-9×0 on architecture Nehalem. With them, Intel began experiments with integration, transferring the memory controller from the motherboard to the crystal.
In 2009, a ruler came out Lynnfield , Budget version of Nehalem. In it under the Crystal cover, the northern bridge of the chipset was moved: the PCIe 2 chip was hooked for the memory controller.0.
In early 2010, the Clarkfield family appeared. With him, Intel has passed from 45-to the 32-nm technology process and added a real video card to the processor. The graphics were made according to 45-nm norms, stood separately from processor nuclei and, with sin, in half launched games five years ago, but I made a healthy impression.
Sandy Bridge – Logical development of intel ideas. A lot of nuclei improvements, more fast graphics, flexible energy consumption and dispersal management. In general, the traditional renewal of architecture: it seems the same, but it works better. There are almost no global changes. Just say “it has become bigger here, so it works faster” it will not work, we need to deal with every trifle and climb into the basics of the computer. To facilitate the perception of the article, we have prepared many explanatory strokes, which we recommend that we contact while reading the main text.
New memory
Let's start with processor nuclei. The first change was made in the Front-End unit (see. Invot "with German pedantry")-added cache drama L0. STO and T, immediately after decoders and records all incoming microoperations from the Front-End unit. Starting to work on a new task, the processor turns to the L0 cache and checks if there are the necessary instructions. If there is, then takes them directly from the cache. If not, then gives a signal to the Front-End unit-and works according to the old scheme. The beauty of the approach is that when the necessary microoperation is found, Front-End is disconnected. As a result, both time and energy consumption are saved.
L0-kesh has no algorithm. Everything is recorded here in a row, but due to the fact that L0-quality stores up to 1,500 microoperations, the chance of hitting, according to Intel, is about 80%.
With German pedantry
Front-End unit is also called a preliminary sample unit. He is responsible for communicating the program with the processor, timely downloads the necessary data and explains to the crystal that it is necessary to do.
Suppose you told the processor to change two numbers. The preliminary sample block will load from the memory and translate the necessary numbers to the language that is understandable to the processor, decipher what multiplication is, and shows which blocks this operation needs to be done on. He will record the last couple of points in the form of microoperations that are now stored in Sandy Bridge in memory L0.
Clairvoyant
Following the Front-End unit processed the branch prediction unit (see. Invot “on coffee grounds”): increased the amount of memory and came up with a new system for recording successful predictions. If in Nehalem each branch of the program was marked in the table with two bits (correctly/wrong), now several branches are recorded on one “correctly”. That is, the volume of the table is the same, and more useful information. They began to treat cache more economically. Previously, a place in L1-Kash was allocated to the reserve-now, in Sandy Bridge, each address takes exactly as much as it needs, not a single byte is spent in vain.
All this is an enlarged memory, a new recording method and the rejection of a fixed address size – should increase the percentage of successful predictions and reduce the frequency of memory discharge. And this means that the nuclei will be less likely to count unnecessary branches and the program will be executed faster.
Out of line
The most of all in Sandy Bridge remade the Out-OF-Order unit (see. “The first one went”), adding the physical registry of files – such a closet with boxes. Before entering the Out-OF-Order, the instructions are putting their data into it, and at the exit they are taken. Thus, the data bus is unloaded, and the instructions skip out-off -order lightly.
This scheme has significantly accelerated the operation of the block of extraordinary execution and allowed Intel to connect the support of the instructions AVX (Advanced Vector Extensions). The new set should replace obsolete instructions SSE (Streaming Simd Extensions) and expand the capabilities of SIMD blocks. Unlike SSE, AVX instructions do not work with 128-, but with 256-bit registers. This increases the accuracy of calculations and allows you to simultaneously work with a lush amount of data. In other words, it will be easier for programmers to write programs, and we will have a graphic, sound, video, and the like players or editors will work faster. Now AVX support only a few programs, but Intel has high hopes for the technology and encourages its use in every possible way.
It is interesting that for the introduction of AVX did not have to change the SIMD blocks-they still work with 128-bit registers, and for calculating AVX simply combine efforts. The same thing happens with memory L1, in which registers are located. As in Nehalem, in Sandy Bridge, the memory is associated with the Front-End block with two tires. In old architecture, one was responsible for the transfer, and the second – for receiving data. Now they work in both directions: when it is necessary to immediately transfer the 256-bit of the information, they are combined and downloaded to SIMD the right amount of data.
On this, changes in processor nuclei end. As you can see, no revolution: they added here, improved there, and all together began to work faster. Let's now see what Intel came up with the rest of the processor.
On coffee grounds
Bloc of the prediction of branches is one of the most important parts of a modern processor. It does not allow a free block of Front-End. The predictor guesses which memory cell and instructions can request the program at the next moment, and makes the processor calculate it in advance.
Take the car conveyor for example. Each time the body assembly ends, the workers install seats in it. The prediction block captures such patterns and, while the car is still assembled, sends an order for chairs, issues the instructions to the workers. Of course, it happens that the car does not require simple seats, but with heating, and then time to order and assembly is spent in vain. But compared to what can be saved on the right orders, this mistake seems to be a trifle.
To make mistakes less often, the transitions predictor leads two tables: in one it stores information about where the chairs come from, in the second – possible instructions. The predictor marks each forecast with two bats (correctly/wrong), and then checks the tables and displays patterns – say, that heating is needed only for every third car, and the chairs – each time each time. The longer the predictor watches the program, the less often he is mistaken.
Graphics
The graphic nucleus on the Clarkdale processors caused conflicting feelings. On the one hand, the processor and the video card in one bottle are cool: the triumph of technology. On the other hand, a game computer with a discrete video card built -in core like a dog fifth leg, and you still have to pay for it.
With Sandy Bridge, everything is the same: the new processors by default are equipped with a graphic core. Only now it is created according to 32-nm technology and one hundred and t on one crystal with processor nuclei.
Intel introduced two graphic processors – HD Graphics 2000 And HD Graphics 3000. The difference between them is in the number of streaming nuclei (execution units). The younger version has six of them, the eldest has twelve. Unfortunately, there is nothing to compare HD Graphics with, they are too different from cards NVIDIA And AMD. There is no video memory – the l3 cache is used. The cores work only with rigidly specified functions – no GPGPU. Perhaps the only characteristic understandable here is the clock frequency. The new graphics operate at a speed of 650 or 850 MHz and can accelerate to 1100, 1250 and 1350 MHz.
Although Intel claims that the new graphic nucleus is twice as built in Clarkdale, the future of the HD Graphics is vaguely. With discrete video cards, she will not compete – he will lose. DirectX 11 support is not, only DirectX 10.1. Sandy Bridge will not get to office computers soon. Perhaps the only thing where HD Graphic will have to go to the place is laptops. Mobile processors equip HD Graphics 3000, with a top version, which means that you can play even on a weak laptop.
The first went
The Out-OF-Order unit (an extraordinary execution block) makes sure that the operations are performed only if the necessary data is available and the processor does not stand in vain in vain.
Imagine a supermarket. You took a bottle of water and want to pay for it. Dita to the cash desk, and there is a man with a full cart to the top. And you are waiting for him to lay out all the products, pick up chewing gum, sweets … In general, spend time. And if a special person stood nearby-Out-OF-Order-then he would tell you: "Come, pay for your water". And the cash desk would not idle, and you did not lose your time.
The same with programs. The Out-OF-Order block is understanding what can already be considered and what should be postponed so far. However, he does not see the difference between the beginning or middle of the program. The main thing is not to let the computing nuclei idle, and you can collect the received data and build it in the right order and then.
Agent Smith
In addition to the graphic nucleus, the North bridge is also under the crystal cover. Now it is called System Agent, not UN-Core, like in Lynnfield and Clarkdale. It includes a two -channel memory controller, PCIe 2.0, FDI tire to output the image from the graphic nucleus and the power control chip (Power Control Unit).
There are few changes. The memory controller was taught to work with DDR3 strings at a frequency of up to 1333 MHz and through the factor to disperse them to 2133 MHz. FDI bus now supports DisplayPort 1.2 and hdmi 1.4-you can watch Blu-Ray 3D. PCIe 2 chip.0 left the same: 16 lines are available, which can be divided into two PCIe X8 slots. The PCU controller can flexibly regulate the frequency and nutrition of both the processor and the graphic nucleus, maintain temperature balance and save energy consumption by any possible ways.
Third transport
To collect so many components into one, Intel had to abandon cross tires and come up with something new. The solution was spied on server processors Nehalem EX , in which the ring bus is used (Ring Bus). The principle of its work can be compared with the transport ring. Data is continuously moving along the road and make stops in the right places: at the processor nucleus, in a system agent, L3 cache, video card, and so on. The beauty is that the number of stops is not limited, additional points can be added to infinity.
In Sandy Bridge, the tire is divided into four rings: data, requests, state monitoring and confirmation. Each can carry 32 bytes for the beat and work at the processor frequency. Peak throughput is 96 Gbit/s. Unfortunately, with a simple and reduction in the frequency of nuclei, the speed of the tire drops, and because of this the work of the graphic processor may suffer. The same applies to L3-Kash.
One for two
Make the processor count faster. Yes, you can introduce all kinds of predictors, extraordinary execution and the like, but the speed of work still depends on the time of performing one operation.
For a long time, productivity increased by increasing the frequency of work or the number of operations performed in a second. This method is effective, but dead end: it is impossible to endlessly increase speed. Therefore, they came up with a new way – parallel calculations.
Another example. They gave you a calculator and asked to add three pairs of numbers. You take one pair, add and write the result, the other – add up and record the result, the third – add up and write down the result. The method is true, but long. To speed up the process, you can call two more people, give everyone a pair of numbers and add everything at the same time. So in one beat you will be able to complete the whole task. The main thing in such a situation is to take care of the command to “add up” and the numbers were available at the same time to all actors.
This method of increasing productivity is called SIMD (Single Instruction – Multiple Data, “One Instruction is a lot of data”). Several computing blocks are placed in the processor nuclei and using special instructions (SSE, AVX), they force them to work in parallel. A table with numbers for them is stored in a special memory cell – register.
Remember
In the Sandy Bridge L3-Kash, it was renovated in Last Level Cache (LLC) and divided into parts-by the number of processor nuclei in the crystal. Each piece of L3-Kash is controlled independently and for energy saving can be converted into sleep mode. The components of the system are associated with the LLC ring tire, and the processor nuclei – direct contact. A system agent who monitors the distribution of free space is responsible for filling L3-Kesh, dividing it between the main and graphic nuclei.
Like the ring tire, the l3 cache operates at the processor frequency. With maximum load, the throughput takes off to heaven, with a simple one, it falls, and in this case the graphic nucleus again suffers.
Big idea
Of course, Sandy Bridge is not only changes and improvements. Intel would not have been herself if I did not come up with something like that. It is about Quick Sync , Also known as Intel Media Engine , built -in engine for working with video.
We all know that Full HD-video is the worst enemy of the processor because it loads it 100% and slows down the entire system. This competently used NVIDIA and AMD: stream processors of their video cards took over the tasks of decoding and unloaded the crystal. The same situation was with a recoding video in another format: the processor spent an hour and a half on this case, the video card coped with the task in 20-30 minutes. As a result, there was an opinion: if you want to work with a video – buy a video card.
Gas!
Turbo Boost technology first appeared in Nehalem processors-Core i7-9×0. Her principle is simple: if some of the nuclei are idle, then the frequency automatically rises automatically. This is done due to the fact that with a simple nuclei they do not consume energy and a reserve for energy consumption (TDP) is created. It is used to increase the frequency of work.
Intel decided to fight this. Special nuclei with rigidly specified decoding algorithms MPEG-2, H was built into the Sandy Bridge Graphic Engine.264 (AVC), VC-1 and other popular formats. Movement processing, de -interleising, color correction was put on the shoulders of streaming nuclei (those very execution units). And – voila! -The processor learned to lose up to five Full HD streams without load on the main nuclei. True, he rounds the reproduction speed of up to 24 frames per second, which on large TVs can lead to the effect of a soap opera when the actors move jerks, but these are already trifles.
Due to the huge decoding speed, Sandy Bridge copes with the translation of films into another format, doing this almost on the fly.
The only problem is that Quick Sync only works with an active graphic core. When the image is fed from a discrete video card, the media engine is not available. Let's hope Intel will fix it.
I dispersed
Let's talk separately about the dispersal. As usual, there are two options – automatic and through https://sister-site.org/jackpot-village/ BIOS. Let's start with the first, namely with technology Turbo Boost , which in Sandy Bridge was updated to version 2.0.
In the first Turbo Boost, the processor nuclei accelerated depending on the stock of TDP. The speed managed to increase by only 200-300 MHz, which was practically not felt. Turbo Boost 2.0 works on a new principle. It monitors the current temperature of the processor and, as soon as a certain supply appears, increases the frequency by 400-500 MHz. In this mode, the crystal works until it heats up, and then goes to the standard speed. Then everything is repeated: acceleration, heating, lowering frequencies. According to Intel, a series of short -term speed increases gives a luster increase in performance than constant work, taking into account TDP.
SCHERS
Intel did not change the name of processor lines, leaving Sandy Bridge familiar to us Core i3, i5 and i7. It is not difficult to distinguish old crystals from new ones: the first generation of processors has a three -digit marking (for example, Intel Core i5- 661 ), and the second has four -digit (Intel Core i5- 2500 ).
True, small confusion still remains, because the numbering of desktop and mobile models through. Say, a desktop processor with the brand "2400" belongs to the Core i5 line, and the mobile with the 2410M index is already to Core i3.
Otherwise, the marking of fresh Sandy Bridge is obvious:
1 – Series:
- i7-top processors, support all Intel technologies, have four cores and are equipped with a 8 MB L3 cache
- i5 – average price segment. Processors can be dual-core and quad-core, deprived of support for Hyper-Threading, Virtualization Technology and Trusted Execution, 3 or 6 MB of the L3 Cash memory are equipped;
- i3-the youngest series of Sandy Bridge, is produced only in a dual-core version and with a 3 mb l3-kesh.
2 – indicates the second generation of the series Core I-x , All Sandy Bridge processors are the same.
3 – indirectly indicates the processor position in the series. The older the figure, the faster the processor. Does not affect supported technologies.
4 – Version:
- K is a processor with an unlocked factor;
- M is a mobile processor;
- S is a processor with an energy consumption reduced to 65 watts;
- T is a processor with an energy consumption reduced to 45/35 watts, the most economical and slow in the series.
Mourning
With manual dispersal, everything is not so rosy. As you know, the processor frequency is a work of tire speed (BCLK) and a nucleus multiplier. Accordingly, you can disperse the processor either by increasing the BCLK value, or by changing the multiplier. In past Intel processors, the factor was unlocked only in expensive line models Extreme Edition. Therefore, everyone worked with the bus, and the results were always excellent: even the younger Core i7 easily rose from 2.66 to 4 GHz. With Sandy Bridge, you can forget about it, the acceleration on the tire Intel has blocked.
The company has tied all the components of the system on one frequency generator (BCLK), and now, when the processor nuclei is dispersed, the frequencies of the built-in video card, ring tire, L3-cache, memory control, PCIe tires, and so on. In general, all that is not intended for acceleration and, with the slightest attempt to raise the frequency, drops the settings. It is impossible to get around.
Now the only option is to work with the multiplier. Intel has already announced the release of appropriate processors under acceleration and sells them by 700-900 rubles more than standard models.
Intel represents
On CES 2011 Intel introduced 29 processors in the new architecture. Fifteen of them are announced for laptops, fourteen for desktop computers. Intel did not come up with new lines and left famous Core i3 , i5 And i7. Only the marking has changed: the name of the model is now recorded not by three, but by four digits (see. Invot "" SCHERS "). The cheapest desktop processor Sandy Bridge, Core i3-2100 , sell for 4000 rubles, the most expensive, Core i7-2600K , for 12,000 rubles. There are no crystals for 1000 euros apiece. In the upper price range, Intel left old Nehalem under the chipset X58 Express.
In addition to standard models, Intel introduced two processors with an unlocked factor – Core i7-2600K And Core i5-2500K. And for some reason these are the only crystals with the top HD Graphics 3000, although what to do with it to overclockers is not clear. As it is not clear why Intel came up with a new socket for Sandy Bridge. What did not fit LGA1156 , They don’t really explain – they just offer to buy a new motherboard with LGA1155.
Continue
Now there are two chipsets on sale – H67 Express And P67 Express. Compared to the fifth episodes of the changes there are few: USB 2 ports.0 is now not 12, but 14 pieces, two contacts SATA Rev have appeared. 3, PCI support disappeared. Everything else is the same: there are LAN and 8 PCIE lines.
P67 Express chipset is positioned for game systems. It does not work with a graphics built into the processor, but can separate 16 PCIe 2 lines.0 for two slots PCIE X8. H67 Express does not know how, but equipped with FDI bus and video outputs DisplayPort, HDMI, DVI and VGA.
The processors are allowed to accelerate only on the P67 Express, the H67 unlocked factor does not see. Why is it unclear, because only crystals with the suffix "K" are equipped with top -end HD Graphics.
All these absurdities of Intel should allow in chipset Z68 Express , Information about which has recently flowed on the Internet. This will be a copy of H67, but already with support for acceleration and two PCIe X8 slots.
* * *
In theory, Sandy Bridge looks promising. A huge number of improvements should provide a tangible increase in performance. Built -in graphic core is an excellent solution for office computers and laptops. AMD and NVIDIA answer in working with video – generally beyond praise. In a word, there is nothing revolutionary in Sandy Bridge, but the number and quality of processing is impressive.
The picture is spoiled only by the strange policy of Intel. Why overclocking processors HD Graphics 3000? Was there a need to change the socket? Why H67 Express does not see an unlocked factor and cannot work with two video cards? All these little things spoil the overall impression, but we will not draw hasty conclusions, first – tests.