参考
http://baiy.cn/doc/cpp/inside_rtti.htm
https://stackoverflow.com/questions/6258559/what-is-the-vtt-for-a-class
https://zhuanlan.zhihu.com/p/41309205
https://stackoverflow.com/questions/6613870/gnu-gcc-g-why-does-it-generate-multiple-dtors
虚函数和虚基类
code
1 |
|
sizeof
虚基类:单独sizeof
A 16 //vptr_a inta 8+8=16
B 32 //vptr_b intb vptr_a inta 8*4=32
C 32 //vptr_c intc vptr_a inta 8*4=32
D 48 //vptr_b intb vptr_c (intc intd) vptr_a inta 8*6=48
| vtable |
+----------+
| b |
+----------+
| vtable |
+----------+
| c |
+----------+
| d |
+----------+
| vtable |
+----------+
| a |
+----------+
非虚基类: 单独sizeof
A 16 //vptr_a inta 8+8=16
B 16 //vptr_a (inta intb) 8*2=16
A
C 16 //vptr_a (inta intc) 8*2=16
D 40 //vptr_a (inta inb) vptr_a (inta intc) intd 8*5=40
| vtable |
+----------+
| a |
+----------+
| b |
+----------+
| vtable |
+----------+
| a |
+----------+
| c |
+----------+
| d |
+----------+
-fdump-class-hierarchy
Vtable for A
A::_ZTV1A: 3u entries
0 (int (*)(...))0
8 (int (*)(...))(& _ZTI1A)
16 (int (*)(...))A::v
Class A
size=16 align=8
base size=12 base align=8
A (0x0x7f0f37c70b40) 0
vptr=((& A::_ZTV1A) + 16u)
Vtable for B
B::_ZTV1B: 8u entries
0 16u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI1B)
24 (int (*)(...))B::w
32 0u
40 (int (*)(...))-16
48 (int (*)(...))(& _ZTI1B)
56 (int (*)(...))A::v
VTT for B
B::_ZTT1B: 2u entries
0 ((& B::_ZTV1B) + 24u)
8 ((& B::_ZTV1B) + 56u)
Class B
size=32 align=8
base size=12 base align=8
B (0x0x7f0f37cab5b0) 0
vptridx=0u vptr=((& B::_ZTV1B) + 24u)
A (0x0x7f0f37c70ba0) 16 virtual
vptridx=8u vbaseoffset=-24 vptr=((& B::_ZTV1B) + 56u)
Vtable for C
C::_ZTV1C: 8u entries
0 16u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI1C)
24 (int (*)(...))C::x
32 0u
40 (int (*)(...))-16
48 (int (*)(...))(& _ZTI1C)
56 (int (*)(...))A::v
VTT for C
C::_ZTT1C: 2u entries
0 ((& C::_ZTV1C) + 24u)
8 ((& C::_ZTV1C) + 56u)
Class C
size=32 align=8
base size=12 base align=8
C (0x0x7f0f37cab9c0) 0
vptridx=0u vptr=((& C::_ZTV1C) + 24u)
A (0x0x7f0f37c70c00) 16 virtual
vptridx=8u vbaseoffset=-24 vptr=((& C::_ZTV1C) + 56u)
Vtable for D
D::_ZTV1D: 13u entries
0 32u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI1D)
24 (int (*)(...))B::w
32 (int (*)(...))D::y
40 16u
48 (int (*)(...))-16
56 (int (*)(...))(& _ZTI1D)
64 (int (*)(...))C::x
72 0u
80 (int (*)(...))-32
88 (int (*)(...))(& _ZTI1D)
96 (int (*)(...))A::v
Construction vtable for B (0x0x7f0f37cabdd0 instance) in D
D::_ZTC1D0_1B: 8u entries
0 32u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI1B)
24 (int (*)(...))B::w
32 0u
40 (int (*)(...))-32
48 (int (*)(...))(& _ZTI1B)
56 (int (*)(...))A::v
Construction vtable for C (0x0x7f0f37cabe38 instance) in D
D::_ZTC1D16_1C: 8u entries
0 16u
8 (int (*)(...))0
16 (int (*)(...))(& _ZTI1C)
24 (int (*)(...))C::x
32 0u
40 (int (*)(...))-16
48 (int (*)(...))(& _ZTI1C)
56 (int (*)(...))A::v
VTT for D
D::_ZTT1D: 7u entries
0 ((& D::_ZTV1D) + 24u)
8 ((& D::_ZTC1D0_1B) + 24u)
16 ((& D::_ZTC1D0_1B) + 56u)
24 ((& D::_ZTC1D16_1C) + 24u)
32 ((& D::_ZTC1D16_1C) + 56u)
40 ((& D::_ZTV1D) + 96u)
48 ((& D::_ZTV1D) + 64u)
Class D
size=48 align=8
base size=32 base align=8
D (0x0x7f0f37a82a80) 0
vptridx=0u vptr=((& D::_ZTV1D) + 24u)
B (0x0x7f0f37cabdd0) 0
primary-for D (0x0x7f0f37a82a80)
subvttidx=8u
A (0x0x7f0f37c70c60) 32 virtual
vptridx=40u vbaseoffset=-24 vptr=((& D::_ZTV1D) + 96u)
C (0x0x7f0f37cabe38) 16
subvttidx=24u vptridx=48u vptr=((& D::_ZTV1D) + 64u)
A (0x0x7f0f37c70c60) alternative-path
这个结果挺乱的,不好分析,命令记住就行
子类对象完整性
子类对象data member并没有和派生类对象data member放在一起,中间有对其的字节填充。这样是为了保持派生类中子类对象的完整性
如果不填充的话,BB拷贝给CC,会把CC的data member(c)给覆盖掉。
1 | class AA{ |
然而我的测试结果是填充在一起了,具体原因未知。
指向数据成员的指针
1 | class AA { |
有的编译期给&AA::a这样的成员变量偏移值结果+1,用于区分AA::*p.即区别:
- 没有指向任何成员变量的指针
- 指向第一个成员变量的指针
但是显然gcc没有这样做。但也可以区分。
成员函数指针
1 | class AA { |
this指针的作用:
主要是找派生类对象的地址。
1 | base2 * pb = new derived; |
new之后,pb = &derived + sizeof(base1); //编译期由编译器决定
pb->func()在这里是调用的derived的func函数,但也可能调用base2的func函数。想想random的情况。所以这是运行期才能决定的。编译器在编译期并不能算出固定结果,所以编译器让他指向一个可变的地址,假如是: (*pb->vptr[4])(this + pb->vptr[2].offset)
//derived-->table
vptr_base1
0 rtti
1 base_offset
2 top_offset //0
3 ~derived()
4 func
vptr_base2
0 rtti
1 base_offset
2 top_offset //-20 假定是-20
3 ~derived()
4 func
//base2-->table
vptr
0 rtti
1 base_offset
2 top_offset //0
3 ~base2()
4 func
- 如果pb = new base2,则top_offset=0,this指针就是base2的地址,detele &base2
- 如果pb = new derived,则pb(this) = &derived + sizeof(base1)(编译器计算),(this + pb->vptr[2].offset),指向的具体值运行期判断,这里就是offset = -20,得到derived的地址,delete &derived
总结:
派生类可以赋值给基类(upcast),是因为有一系列机制来保证,比如this指针偏移,base指针偏移,rtti,即整个vtable.
正是因为有这么多机制保证upcast转换可行,才导致base指针可能指向base对象也可能指向derived对象的不确定性,从而导致downcast的不可行,只能在运行期判断了。编译器在对象构造的时候已经确定了vptr和vtable,同样的this+offset指向的内存确实根据base指针指向的空间即vtable的不同来变化的。
下面的代码对于downcast也印证了this指针偏移在类型转换的时候,也是找对象首地址的:
1 | class AA { |
upcast是在编译期执行的,对象内存布局是确定的,只有在指针,引用的时候由于不明确
导致的含义模糊才需要this指针。
base指针的作用
就是因为有虚基类才有了base指针:
虚基类的地址在类的最下边,属于可变区域,编译器不能直接sizeof计算偏移值,所以引入了base指针偏移值来计算。不同于this指针是找对象的首地址,base指针是找对象的末尾子类地址的。仅此而已了。
thrunk
1 | class A { |
C重写了B的w函数,vptr_b和vptr_a都指向了一个w().thrunk的作用就是用trunk函数做一个转换,先偏移this指针,然后调用上面的C::w().
VTT(virtual table tbale)
避免二次消化,把原文贴上来,即使是英文,啃下来确实收益很大的:
https://stackoverflow.com/questions/6258559/what-is-the-vtt-for-a-class
PART2:
Construction/Destruction in the Presence of Multiple Inheritance
How is the above object constructed in memory when the object itself is constructed? And how do we ensure that a partially-constructed object (and its vtable) are safe for constructors to operate on?
Fortunately, it’s all handled very carefully for us. Say we’re constructing a new object of type D (through, for example, new D). First, the memory for the object is allocated in the heap and a pointer returned. D’s constructor is invoked, but before doing any D-specific construction it call’s A’s constructor on the object (after adjusting the this pointer, of course!). A’s constructor fills in the A part of the D object as if it were an instance of A.
d --> +----------+
| |
+----------+
| |
+----------+
| |
+----------+
| | +-----------------------+
+----------+ | 0 (top_offset) |
| | +-----------------------+
+----------+ | ptr to typeinfo for A |
| vtable |-----> +-----------------------+
+----------+ | A::v() |
| a | +-----------------------+
+----------+
Control is returned to D’s constructor, which invokes B’s constructor. (Pointer adjustment isn’t needed here.) When B’s constructor is done,the object looks like this:
B-in-D
+-----------------------+
| 20 (vbase_offset) |
+-----------------------+
| 0 (top_offset) |
+-----------------------+
d --> +----------+ | ptr to typeinfo for B |
| vtable |------> +-----------------------+
+----------+ | B::w() |
| b | +-----------------------+
+----------+ | 0 (vbase_offset) |
| | +-----------------------+
+----------+ | -20 (top_offset) |
| | +-----------------------+
+----------+ | ptr to typeinfo for B |
| | +--> +-----------------------+
+----------+ | | A::v() |
| vtable |---+ +-----------------------+
+----------+
| a |
+----------+
But wait… B’s constructor modified the A part of the object by changing it’s vtable pointer! How did it know to distinguish this kind of B-in-D from a B-in-something-else (or a standalone B for that matter)? Simple. The virtual table table told it to do this. This structure, abbreviated VTT, is a table of vtables used in construction. In our case, the VTT for D looks like this:
B-in-D
+-----------------------+
| 20 (vbase_offset) |
VTT for D +-----------------------+
+-------------------+ | 0 (top_offset) |
| vtable for D |-------------+ +-----------------------+
+-------------------+ | | ptr to typeinfo for B |
| vtable for B-in-D |-------------|----------> +-----------------------+
+-------------------+ | | B::w() |
| vtable for B-in-D |-------------|--------+ +-----------------------+
+-------------------+ | | | 0 (vbase_offset) |
| vtable for C-in-D |-------------|-----+ | +-----------------------+
+-------------------+ | | | | -20 (top_offset) |
| vtable for C-in-D |-------------|--+ | | +-----------------------+
+-------------------+ | | | | | ptr to typeinfo for B |
| vtable for D |----------+ | | | +-> +-----------------------+
+-------------------+ | | | | | A::v() |
| vtable for D |-------+ | | | | +-----------------------+
+-------------------+ | | | | |
| | | | | C-in-D
| | | | | +-----------------------+
| | | | | | 12 (vbase_offset) |
| | | | | +-----------------------+
| | | | | | 0 (top_offset) |
| | | | | +-----------------------+
| | | | | | ptr to typeinfo for C |
| | | | +----> +-----------------------+
| | | | | C::x() |
| | | | +-----------------------+
| | | | | 0 (vbase_offset) |
| | | | +-----------------------+
| | | | | -12 (top_offset) |
| | | | +-----------------------+
| | | | | ptr to typeinfo for C |
| | | +-------> +-----------------------+
| | | | A::v() |
| | | +-----------------------+
| | |
| | | D
| | | +-----------------------+
| | | | 20 (vbase_offset) |
| | | +-----------------------+
| | | | 0 (top_offset) |
| | | +-----------------------+
| | | | ptr to typeinfo for D |
| | +----------> +-----------------------+
| | | B::w() |
| | +-----------------------+
| | | D::y() |
| | +-----------------------+
| | | 12 (vbase_offset) |
| | +-----------------------+
| | | -8 (top_offset) |
| | +-----------------------+
| | | ptr to typeinfo for D |
+----------------> +-----------------------+
| | C::x() |
| +-----------------------+
| | 0 (vbase_offset) |
| +-----------------------+
| | -20 (top_offset) |
| +-----------------------+
| | ptr to typeinfo for D |
+-------------> +-----------------------+
| A::v() |
+-----------------------+
D’s constructor passes a pointer into D’s VTT to B’s constructor (in this case, it passes in the address of the first B-in-D entry). And, indeed,the vtable that was used for the object layout above is a special vtable used just for the construction of B-in-D.
Control is returned to the D constructor, and it calls the C constructor(with a VTT address parameter pointing to the “C-in-D+12” entry). When C’s constructor is done with the object it looks like this:
B-in-D
+-----------------------+
| 20 (vbase_offset) |
+-----------------------+
| 0 (top_offset) |
+-----------------------+
| ptr to typeinfo for B |
+---------------------------------> +-----------------------+
| | B::w() |
| +-----------------------+
| C-in-D | 0 (vbase_offset) |
| +-----------------------+ +-----------------------+
d --> +----------+ | | 12 (vbase_offset) | | -20 (top_offset) |
| vtable |--+ +-----------------------+ +-----------------------+
+----------+ | 0 (top_offset) | | ptr to typeinfo for B |
| b | +-----------------------+ +-----------------------+
+----------+ | ptr to typeinfo for C | | A::v() |
| vtable |--------> +-----------------------+ +-----------------------+
+----------+ | C::x() |
| c | +-----------------------+
+----------+ | 0 (vbase_offset) |
| | +-----------------------+
+----------+ | -12 (top_offset) |
| vtable |--+ +-----------------------+
+----------+ | | ptr to typeinfo for C |
| a | +-----> +-----------------------+
+----------+ | A::v() |
+-----------------------+
As you see, C’s constructor again modified the embedded A’s vtable pointer.The embedded C and A objects are now using the special construction C-in-D vtable, and the embedded B object is using the special construction B-in-D vtable. Finally, D’s constructor finishes the job and we end up with the same diagram as before:
+-----------------------+
| 20 (vbase_offset) |
+-----------------------+
| 0 (top_offset) |
+-----------------------+
| ptr to typeinfo for D |
+----------> +-----------------------+
d --> +----------+ | | B::w() |
| vtable |----+ +-----------------------+
+----------+ | D::y() |
| b | +-----------------------+
+----------+ | 12 (vbase_offset) |
| vtable |---------+ +-----------------------+
+----------+ | | -8 (top_offset) |
| c | | +-----------------------+
+----------+ | | ptr to typeinfo for D |
| d | +-----> +-----------------------+
+----------+ | C::x() |
| vtable |----+ +-----------------------+
+----------+ | | 0 (vbase_offset) |
| a | | +-----------------------+
+----------+ | | -20 (top_offset) |
| +-----------------------+
| | ptr to typeinfo for D |
+----------> +-----------------------+
| A::v() |
+-----------------------+
Destruction occurs in the same fashion but in reverse. D’s destructor is invoked. After the user’s destruction code runs, the destructor calls C’s destructor and directs it to use the relevant portion of D’s VTT. C’s destructor manipulates the vtable pointers in the same way it did during construction; that is, the relevant vtable pointers now point into the C-in-D construction vtable. Then it runs the user’s destruction code for C and returns control to D’s destructor, which next invokes B’s destructor with a reference into D’s VTT. B’s destructor sets up the relevant portions of the object to refer into the B-in-D construction vtable. It runs the user’s destruction code for B and returns control to D’s destructor, which finally invokes A’s destructor. A’s destructor changes the vtable for the A portion of the object to refer into the vtable for A. Finally, control returns to D’s destructor and destruction of the object is complete. The memory once used by the object is returned to the system.
简单总结:
普通的一个派生类,赋值给一个基类对象,直接偏移就行了,派生类中基类的部分可以直接给独立的基类来用。但是对于含有虚基类的派生类来说,由于虚基类的内存分布在最底下,和中间类(B,C)独立时的布局是不一样的。对于对象赋值的偏移可以根据base指针,但是构造函数构造vtable的时候应该构造哪种vtable?当构造函数构造B,C的时候应该使用B,C独立状态下的vtable还是B-in-D,C-in-D时候的vtable,就需要VTT来保存多个虚指针和虚表随机应变了。(普通情况下派生类一张表就够了,现在得3张表了,7个虚指针:2+2+3)
in-charge not-in-charge in-charge-delete
续上面的文章:
Now, in fact, the story is somewhat more complicated. Have you ever seen those “in-charge” and “not-in-charge” constructor and destructor specifications in GCC-produced warning and error messages or in GCC-produced binaries? Well, the fact is that there can be two constructor implementations and up to three destructor implementations.
An “in-charge” (or complete object) constructor is one that constructs virtual bases, and a “not-in-charge” (or base object) constructor is one that does not. Consider our above example. If a B is constructed, its constructor needs to call A’s constructor to construct it. Similarly, C’s constructor needs to construct A. However, if B and C are constructed as part of a construction of a D, their constructors should not construct A, because A is a virtual base and D’s constructor will take care of constructing it exactly once for the instance of D. Consider the cases:
If you do a new A, A’s “in-charge” constructor is invoked to construct A. When you do a new B, B’s “in-charge” constructor is invoked. It will call the “not-in-charge” constructor for A.
new C is similar to new B.
A new D invokes D’s “in-charge” constructor. Wewalked through this example. D’s “in-charge” constructor calls the”not-in-charge” versions of A’s, B’s, and C’s constructors (in thatorder).
An “in-charge” destructor is the analogue of an “in-charge”constructor—it takes charge of destructing virtual bases. Similarly,a “not-in-charge” destructor is generated. But there’s a third one as well. An “in-charge deleting” destructor is one that deallocates the storage as well as destructing the object. So when is one called in preference to the other?
Well, there are two kinds of objects that can be destructed—those allocated on the stack, and those allocated in the heap. Consider this code (given our diamond hierarchy with virtual-inheritance from before):
D d; // allocates a D on the stack and constructs it
D pd = new D; // allocates a D in the heap and constructs it
/ … */
delete pd; // calls “in-charge deleting” destructor for D
return; // calls “in-charge” destructor for stack-allocated D
We see that the actual delete operator isn’t invoked by the code doing the delete, but rather by the in-charge deleting destructor for the object being deleted. Why do it this way? Why not have the caller call the in-charge destructor, then delete the object? Then you’d have only two copies of destructor implementations instead of three…
Well, the compiler could do such a thing, but it would be morecomplicated for other reasons. Consider this code (assuming a virtual destructor,which you always use, right?…right?!?):
D *pd = new D; // allocates a D in the heap and constructs it
C pc = d; // we have a pointer-to-C that points to our heap-allocated D
/ … */
delete pc; // call destructor thunk through vtable, but what about delete?
If you didn’t have an “in-charge deleting” variety of D’s destructor, then the delete operation would need to adjust the pointer just like the destructor thunk does. Remember, the C object is embedded in a D, and so our pointer-to-C above is adjusted to point into the middle of our D object.We can’t just delete this pointer, since it isn’t the pointer that was returned by malloc() when we constructed it.
So, if we didn’t have an in-charge deleting destructor, we’d have to have thunks to the delete operator (and represent them in our vtables), or something else similar.
Thunks, Virtual and Non-Virtual
This section not written yet.
Multiple Inheritance with Virtual Methods on One Side
Okay. One last exercise. What if we have a diamond inheritance hierarchy with virtual inheritance, as before, but only have virtual methods along one side of it? So:
class A {
public:
int a;
};
class B : public virtual A {
public:
int b;
virtual void w();
};
class C : public virtual A {
public:
int c;
};
class D : public B, public C {
public:
int d;
virtual void y();
};
In this case the object layout is the following:
+-----------------------+
| 20 (vbase_offset) |
+-----------------------+
| 0 (top_offset) |
+-----------------------+
| ptr to typeinfo for D |
+----------> +-----------------------+
d --> +----------+ | | B::w() |
| vtable |----+ +-----------------------+
+----------+ | D::y() |
| b | +-----------------------+
+----------+ | 12 (vbase_offset) |
| vtable |---------+ +-----------------------+
+----------+ | | -8 (top_offset) |
| c | | +-----------------------+
+----------+ | | ptr to typeinfo for D |
| d | +-----> +-----------------------+
+----------+
| a |
+----------+
So you can see the C subobject, which has no virtual methods, still has a vtable (albeit empty). Indeed, all instances of C have an empty vtable.
Thanks, Morgan Deters!!
再加上这个:
https://stackoverflow.com/questions/6613870/gnu-gcc-g-why-does-it-generate-multiple-dtors
First, the purposes of these functions are described in the Itanium C++ ABI; see definitions under “base object destructor”, “complete object destructor”, and “deleting destructor”. The mapping to mangled names is given in 5.1.4.
Basically:
- D2 is the “base object destructor”. It destroys the object itself, as well as data members and non-virtual base classes.
- D1 is the “complete object destructor”. It additionally destroys virtual base classes.
- D0 is the “deleting object destructor”. It does everything the complete object destructor does, plus it calls operator delete to actually free the memory.
If you have no virtual base classes, D2 and D1 are identical; GCC will, on sufficient optimization levels, actually alias the symbols to the same code for both.
in-charge == complete object dtor
not-in-charge == base object dtor
in-charge-delete == deleting object dtor
1 | //-C或--demangle:将低级符号名解码(demangle)成用户级名字。 |