Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Single Core Parallelism: Harnessing the Power of SIMD Assembly Instructions
When things get this complicated, you know it’s usually Intel’s fault. Source: Pixabay
SIMD Assembly instructions let you manipulate batches of data in parallel, in a single core. I’ve said it once and I’ll say it again: programming is the closest thing we have to magic.
Shell commands are like small cantrips, Python scripts are little helpful Tulpas. We even have our own Daemons! But whenever we need to squeeze performance to the last byte–When we know a single misstep will make the program slow down drastically… That’s when Assembly, the darkest of black magicks, comes in.
I haven’t been writing too often lately, as you probably didn’t notice (incidentally, if this is your first time reading me, welcome! Nice to meet you!). That’s because I had a pretty harsh exam last week, and I had to prepare a lot. The subject is called ‘Computer’s Organization II’, and it’s been a huge challenge keeping up to date with it.So I decided I would take one of the exercises I made while I practiced, and turn it into an article. That way I can break two scissors with one stone (killing birds is bad and you should feel bad).
Without delaying this further, let’s cut straight to the chase. As usual, the code is available at this Github project.
What are processor instructions?
All the code we write, be it in Python, Java, or C, is eventually interpreted, or compiled, into tiny, atomic (from a programmer’s perspective) instructions for our CPU(s).These instructions number in the thousands, and each of them does a very small thing, interacting directly with hardware.
As an example, an instruction may write a value into memory (that’s what variable assignment translates into), turn a bit on or off, or do a logical AND.
My PC has an Intel processor, which is also the architecture we learn about in class, so sorry to all my ARM using readers, I won’t be inclusive enough today.
The language in which these instructions are written (which translates 1:1 to literal binary) is called Assembly.
Translating C to Assembly: Let’s become compilers for a while.
For this article we’ll be using a very small C function. Here’s it’s entire code:
This function takes as parameters a pointer to a stream of bytes (a char weighs a single byte), a have char and a want char.It’ll assume the stream ends in a 0 (and crash into a segmentation-fault if that’s not the case), and iterate it byte by byte, replacing each instance of ‘have’ with want.As far as C goes, this is as fast as it gets–And it’s faster than Python by a long shot (when I ran some benchmarks, the Python version of this function took two minutes for an input size that took 6 seconds in C).
What does this function look like in Assembly language, after going through the compiler? It will probably be something like this:
Running that assembly function instead of the C version shouldn’t increase our performance. It may even lower it, since the compiler knows a few tricks we probably don’t, and does a few optimizations on this kind of code.
There’s an optimization it doesn’t usually use, though, and when it does, it never uses it to its full potential.
SIMD Instructions: Single Instruction, Multiple Data
Whenever we think of parallelism, we think of multicore processes, or even clusters. But what if we made a single core do many things at a time? That’s the idea the people at Intel had a few decades ago, and the world of image processing hasn’t been the same ever since.
You see, normally data are stored in general purpose Registers, like the ones we just used, in our CPU. Most of them are 64 bits in size, and thus can store a long, a float, or an int. Well, technically two ints, but it’s still not enough to be worth making the instructions to use them in parallel.
However, most processors have even bigger registers available: XMMs, with 128 bits on them. That’s enough for 16 whole bytes!
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.