2010년 1월 10일 (일) 22:20 판 편집 Adrenalin (토론 \| 기여) 12,926 편집 편집 요약 없음 ← 이전 편집		2010년 3월 4일 (목) 02:33 판 편집 편집 취소 Adrenalin (토론 \| 기여) 12,926 편집 →‎신규 명령어: 미번역부 제거 다음 편집 →
15번째 줄: 이러한 여러 명령어들은 펜린에서 한 싸이클 셔플 엔진(shuffle engine)에 의해 구현된다.( 셔플은 비트들을 재배치 시키는 것에 관련된 동작이다.) ~~===SSE4.1===~~ ~~{\| class="wikitable"~~ \|- ~~! 명령어~~ ~~! 설명~~ \|- ~~\| MPSADBW~~ \| Compute eight offset sums of absolute differences (i.e. \|x<sub>0</sub>-y<sub>0</sub>\|+\|x<sub>1</sub>-y<sub>1</sub>\|+\|x<sub>2</sub>-y<sub>2</sub>\|+\|x<sub>3</sub>-y<sub>3</sub>\|, \|x<sub>0</sub>-y<sub>1</sub>\|+\|x<sub>1</sub>-y<sub>2</sub>\|+\|x<sub>2</sub>-y<sub>3</sub>\|+\|x<sub>3</sub>-y<sub>4</sub>\|, ...); this operation is extremely important for modern [[HDTV]] [[codec]]s, and (see <ref>[http://softwarecommunity.intel.com/articles/eng/1246.htm Motion Estimation with Intel Streaming SIMD Extensions 4 (Intel SSE4)], Intel.</ref>) allows an 8x8 block difference to be computed in fewer than seven cycles. One bit of a three-bit immediate operand indicates whether y<sub>0</sub> .. y<sub>10</sub> or y<sub>4</sub> .. y<sub>14</sub> should be used from the destination operand, the other two whether x<sub>0</sub>..x<sub>3</sub>, x<sub>4</sub>..x<sub>7</sub>, x<sub>8</sub>..x<sub>11</sub> or x<sub>12</sub>..x<sub>15</sub> should be used from the source. \|- ~~\| PHMINPOSUW~~ ~~\| Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source.~~ \|- ~~\| PMULDQ~~ ~~\| Packed signed multiplication on two sets of 2 out of 4 packed integers, the 1st and 3rd per packed 4, giving 2 packed 64-bit results.~~ \|- ~~\| PMULLD~~ ~~\| Packed signed multiplication, 4 packed sets of 32-bit integers multiplied to give 4 packed 32-bit results.~~ \|- ~~\| DPPS, DPPD~~ \| Dot product for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output. \|- ~~\| BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW~~ ~~\| Conditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0.~~ \|- ~~\| PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD~~ ~~\| Packed minimum/maximum for different integer operand types~~ \|- ~~\| ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD~~ ~~\| Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand~~ \|- ~~\| INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQ~~ \| The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register memory location and insert it into a field in the destination register given by an immediate operand, EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax], xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0. \|- ~~\|PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ~~ ~~\| Packed sign/zero extension to wider types~~ \|- ~~\| PTEST~~ \| This does the same as the TEST instruction, in that it sets the ZF and CF flags to the result of an AND between its operators ... it sets the Z flag if any of the bits matched, and the C flag if all of them did. \|- ~~\| PCMPEQQ~~ ~~\| Quadword (64 bits) compare for equality~~ \|- ~~\| PACKUSDW~~ ~~\| Convert signed DWORDs into unsigned WORDs with saturation.~~ \|- ~~\| MOVNTDQA~~ ~~\| Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus.~~ \|} ~~===SSE4.2===~~ ~~{\| class="wikitable"~~ \|- ~~! 명령어~~ ~~! 설명~~ \|- ~~\| CRC32~~ \| Accumulate [[CRC32]]C value using the polynomial 0x11EDC6F41 (or, without the high order bit, 0x1EDC6F41).<ref>[http://softwarecommunity.intel.com/isn/Downloads/Intel%20SSE4%20Programming%20Reference.pdf Intel SSE4 Programming Reference] p. 61. See also [http://www.rfc-editor.org/rfc/rfc3385.txt RFC 3385] for discussion of the CRC32C polynomial.</ref> \|- ~~\| PCMPESTRI~~ ~~\| Packed Compare Explicit Length Strings, Return Index~~ \|- ~~\| PCMPESTRM~~ ~~\| Packed Compare Explicit Length Strings, Return Mask~~ \|- ~~\| PCMPISTRI~~ ~~\| Packed Compare Implicit Length Strings, Return Index~~ \|- ~~\| PCMPISTRM~~ ~~\| Packed Compare Implicit Length String, Return Mask~~ \|- ~~\| PCMPGTQ~~ ~~\| Compare Packed Data For Greater Than~~ \|- ~~\| POPCNT~~ \| [[Hamming weight\|Population count]] (count number of bits set to 1) - bit manipulation; shares the same [[opcode]] for JMPE, the instruction used in [[Itanium]] CPUs to escape from [[IA-32]] mode. POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence. \|} ~~===SSE4A===~~ ~~{\| class="wikitable"~~ \|- ~~! 명령어~~ ~~! 설명~~ \|- ~~\| LZCNT~~ \| Leading Zero Count - bit manipulation. LZCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm LZCNT presence. \|- ~~\| POPCNT~~ \| [[Hamming weight\|Population count]] (count number of bits set to 1) - bit manipulation; shares the same [[opcode]] for JMPE, the instruction used in [[Itanium]] CPUs to escape from [[IA-32]] mode. POPCNT instruction may also be implemented in some processors that do not support the other SSE4 instructions and a separate bit can be tested to confirm POPCNT presence. \|- ~~\| EXTRQ/INSERTQ~~ ~~\| Combined mask-shift instructions.~~ \|- ~~\| MOVNTSD/MOVNTSS~~ ~~\| Scalar streaming store instructions.~~ \|} ==같이 보기==

SSE4: 두 판 사이의 차이