2012年10月15日 星期一

Linux Kernel: ARRAY_SIZE()

通常我們在C語言中取得陣列的元數個數可以透過下列的方式來計算:


但如同Jserv大大在這篇文章中所提到
ARRAY_SIZE() 這樣的macro其實是陷阱重重...
因為macro本身沒辦法做型態檢查,只是單純的將值帶入並展開
而在C中我們常常會將指標和陣列混著使用
因此若是我們將指向該陣列的指標傳入,就會得到錯誤的計算結果
如下面的程式:


若傳入陣列a,則結果會正確顯示size大小為10
但若傳入的是指向陣列a的指標a_ptr
則因為指標的在32位元作業系統上大小為4 bytes
(4 / 4) 的結果則會變成1,並不是我們所要的答案10

當然只要我們小心使用,這樣的問題其實是可以避免的
但我們常常有可能會將陣列透過指標的方式傳入某個function中
這樣的情況下我們就有可能會將指標誤用成陣列傳入ARRAY_SIZE() 而得到錯誤的結果
當程式成長到一定的複雜度後,類似的bug就很有可能被忽略

因此Linux在定義ARRAY_SIZE() 時除了透過上述的方式來取得陣列元數個數外
還另外加上了型態檢查,以確保使用者所傳入的參數必須為陣列而非指標

Linux中的 ARRAY_SIZE() 是被定義在:include/linux/kernel.h
其定義如下:


其中在最尾端額外加了__must_be_array() 的回傳值

__must_be_array() 這個macro是用來判斷所傳入的參數是否唯一陣列
(定義在:include/linux/compiler-gcc.h)
其定義如下:


在這邊__must_be_array() 對所傳入的參數a做了一次"降級 (degrade)"
並將其當作第二個參數傳入__same_type() 這個macro
在這邊做降級的目的就是為了讓陣列轉成一個指標
但若所傳入的參數a是個不應傳入的指標,則這樣的降級轉換後的結果仍會是相同的原指標

__same_type() 的回傳值則會傳入BUILD_BUG_ON_ZERO()
(BUILD_BUG_ON_ZERO() 的說明可以參考上一篇文章)

__same_type()的定義則如下:
(定義在:include/linux/compiler.h)


在這邊我們可以看到__same_type() 呼叫了GCC的built-in function:__builtin_types_compatible_p()
__builtin_types_compatible_p() 的定義可以參考GCC manual的說明:
— Built-in Function: int __builtin_types_compatible_p (type1, type2)
You can use the built-in function __builtin_types_compatible_p to determine whether two types are the same.
This built-in function returns 1 if the unqualified versions of the types type1 and type2 (which are types, not expressions) are compatible, 0 otherwise. The result of this built-in function can be used in integer constant expressions.
This built-in function ignores top level qualifiers (e.g., const, volatile). For example, int is equivalent to const int.
The type int[] and int[5] are compatible. On the other hand, int and char * are not compatible, even if the size of their types, on the particular architecture are the same. Also, the amount of pointer indirection is taken into account when determining similarity. Consequently, short * is not similar to short **. Furthermore, two types that are typedefed are considered compatible if their underlying types are compatible.
An enum type is not considered to be compatible with another enum type even if both are compatible with the same integer type; this is what the C standard specifies. For example, enum {foo, bar} is not similar to enum {hot, dog}.
You would typically use this function in code whose execution varies depending on the arguments' types. For example:
          #define foo(x)                                                  \
            ({                                                           \
              typeof (x) tmp = (x);                                       \
              if (__builtin_types_compatible_p (typeof (x), long double)) \
                tmp = foo_long_double (tmp);                              \
              else if (__builtin_types_compatible_p (typeof (x), double)) \
                tmp = foo_double (tmp);                                   \
              else if (__builtin_types_compatible_p (typeof (x), float))  \
                tmp = foo_float (tmp);                                    \
              else                                                        \
                abort ();                                                 \
              tmp;                                                        \
            })
   
Note: This construct is only available for C.
也就是說__builtin_types_compatible_p() 會檢查所傳入的型態:type_1type_2 是否相同
type_1type_2 的型態相同,則會回傳1
type_1type_2 的型態不同,則會回傳0

此外,為了取得參數的型態,這邊還另外用到了另一個GCC的extension:typeof()
typeof()的定義同樣可以參考GCC manual的說明:
6.6 Referring to a Type with typeof

Another way to refer to the type of an expression is with typeof. The syntax of using of this keyword looks like sizeof, but the construct acts semantically like a type name defined with typedef.
There are two ways of writing the argument to typeof: with an expression or with a type. Here is an example with an expression:
     typeof (x[0](1))
This assumes that x is an array of pointers to functions; the type described is that of the values of the functions.
Here is an example with a typename as the argument:
     typeof (int *)
Here the type described is that of pointers to int.
If you are writing a header file that must work when included in ISO C programs, write __typeof__ instead of typeof. See Alternate Keywords.
A typeof-construct can be used anywhere a typedef name could be used. For example, you can use it in a declaration, in a cast, or inside of sizeof or typeof.
The operand of typeof is evaluated for its side effects if and only if it is an expression of variably modified type or the name of such a type.
typeof is often useful in conjunction with the statements-within-expressions feature. Here is how the two together can be used to define a safe 「maximum」 macro that operates on any arithmetic type and evaluates each of its arguments exactly once:
     #define max(a,b) \
       ({ typeof (a) _a = (a); \
           typeof (b) _b = (b); \
         _a > _b ? _a : _b; })
The reason for using names that start with underscores for the local variables is to avoid conflicts with variable names that occur within the expressions that are substituted for a and b. Eventually we hope to design a new form of declaration syntax that allows you to declare variables whose scopes start only after their initializers; this will be a more reliable way to prevent such conflicts.
Some more examples of the use of typeof:
This declares y with the type of what x points to.
          typeof (*x) y;
   
This declares y as an array of such values.
          typeof (*x) y[4];
   
This declares y as an array of pointers to characters:
          typeof (typeof (char *)[4]) y;
   
It is equivalent to the following traditional C declaration:
          char *y[4];
   
To see the meaning of the declaration using typeof, and why it might be a useful way to write, rewrite it with these macros:
          #define pointer(T)  typeof(T *)
          #define array(T, N) typeof(T [N])
   
Now the declaration can be rewritten this way:
          array (pointer (char), 4) y;
   
Thus, array (pointer (char), 4) is the type of arrays of 4 pointers to char.
Compatibility Note: In addition to typeof, GCC 2 supported a more limited extension which permitted one to write
     typedef T = expr;
with the effect of declaring T to have the type of the expression expr. This extension does not work with GCC 3 (versions between 3.0 and 3.2 will crash; 3.2.1 and later give an error). Code which relies on it should be rewritten to use typeof:
     typedef typeof(expr) T;
This will work with all versions of GCC.
typeof() 可以取得所傳入參數的型態
因此我們可以透過typeof() 來宣告一個與所傳入參數一模一樣的新變數:


在此我們宣告一個與陣列a型態一模一樣的陣列b
因此透過ARRAY_SIZE() 計算陣列元數個數的結果都會是10

回到__same_type(),透過__builtin_types_compatible_p()typeof()
我們就可以知道所傳入的兩個參數型態是否相同
如果相同(傳入ARRAY_SIZE() 的參數為一不應傳入的指標)
__builtin_types_compatible_p() 就會回傳1,再傳入BUILD_BUG_ON_ZERO() 後就會造成編譯錯誤
但如果不同(傳入ARRAY_SIZE() 的參數為一正確的陣列)
__builtin_types_compatible_p() 就會回傳0,再傳入BUILD_BUG_ON_ZERO() 後得到的結果為0
加回ARRAY_SIZE() 後並不會影響其原先結果

透過這樣的方式,我們便可在compile-time的時候就發現所傳入ARRAY_SIZE() 的參數是否為一錯誤的指標
並可在編譯時期加以修正...

此外Jserv大大那篇文章下面的回應也有人提出了其他不同的作法
雖然其原意是為了要避免使用GCC extension的
但最後發現原來typeof() 也是一個GCC extension
不過作法同樣可以作為參考

------------------

額外參考資料:

沒有留言: