Tuesday, 24 May 2016

Reordering C Structure Alignment

Flow chart idea
The memory footprint of C programs can be reduced by manually repacking C structure declarations. This article is about reordering the structure members in decreasing lengths of variables.
Computer memory is word-addressable. For every instruction, the CPU fetches one word data from the memory. This behaviour affects how basic C data types will be aligned in memory. For example, every short (2B) will start on an address divisible by 2; similarly, int(4B) or long(8B) start on addresses divisible by 4 and 8, respectively. But char(1B) can start on any address. This happens due to self-alignment by the processor. Be it X86, ARM or PPC—all display this behaviour.
The following example will explain self-alignment and its advantages. In this example, the size of the pointer is assumed to be 4B. Here, we will use a user-defined struct data type.
struct ESTR
{
char c; /*1B*/
unsigned int* up; /*4B*/
short x; /*2B*/
int v; /*4B*/
char k; /*1B*/
};
You may think this structure will take 12B memory space. But if you print sizeof(struct ESTR) you will find the result to be 20B. This is because of self-alignment. This structure will be placed in memory in the following way:
fig 1
So, you can see all the variables are starting on word boundaries. It is good practice to explicitly mention the padding so that no places are left with junk values. With padding, the structure resembles the following:
struct ESTR
{
char c; /*1B*/
char pad1[3];
unsigned int* up; /*4B*/
short x; /*2B*/
char pad2[2];
int v; /*4B*/
char k; /*1B*/
char pad3[3];
};
Now our structure looks like what’s shown below:
fig 2
The best way to decrease the slope is by ‘reordering the structure’ members in decreasing lengths of variables. If we reorder the structure, we get the following:
struct ESTR
{
unsigned int* up; /*4B*/
int v; /*4B*/
short x; /*2B*/
char c; /*1B*/
char k; /*1B*/
};
Now, our structure looks like what’s shown below:
fig 3
You can see the structure has come down to 12B from 20B. But do not think that padding can be completely eliminated. In this specific example, it has happened with reordering. This may not happen always. But the right practice is to sort the members according to their size.
The disadvantage of reordering
Reordering may affect cache-locality. For example, in a structure, you refer to two variables very frequently. But because of reordering, they are placed very far in terms of word number. So when they are mapped in cache, they become part of different blocks which generate cache-miss and require very frequent cache block swapping. Hence, execution becomes slow.
‘Structure packing’ stops compilers from adding any pad but forces the use of a defined order. It can be achieved with ‘#pragma pack’ in C. Then our first structure alignment looks like what follows:
fig 4
This is called structure packing.
Now, if the CPU wants variable v or up it needs two word accesses for each word consisting of 4 bytes.
So it is the design, which will enable users to make a choice. In embedded systems, there is a performance versus memory size trade-off, because everybody expects faster responses but the size of the memory is fixed. Fragmentation may also not be handled in an embedded OS because of the size constraints. So, the designer has to be careful.

No comments:

Post a Comment