【Angr源码分析】4. IRSB 的执行

前情提要 and 不重要的部分

上一篇文章提到,对 VEX IRSB 的符号执行由 self.handle_vex_block() 发起:

1
2
3
4
5
6
7
8
9
10
11
12
13
# venv/lib/python3.8/site-packages/angr/engines/vex/heavy/heavy.py
class HeavyVEXMixin(SuccessorsMixin, ClaripyDataMixin, SimStateStorageMixin, VEXMixin, VEXLifter):
def process_successors(self, ...):
... ...
while True:
if irsb is None:
# 生成 IRSB
irsb = self.lift_vex(addr=addr, state=self.state, ...)
... ...
try:
# 进行符号执行
self.handle_vex_block(irsb)
except errors.SimReliftException as e:

但在执行到这里的时候,当前的 self 并不是 HeavyVEXMixin 类的对象,而是 UberEngine 的对象。具体原因去看上一篇文章。UberEngine 继承的类如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from .vex import HeavyVEXMixin, TrackActionsMixin, SimInspectMixin, HeavyResilienceMixin, SuperFastpathMixin

... ...

# The default execution engine
# You may remove unused mixins from this default engine to speed up execution
class UberEngine(
SimEngineFailure,
SimEngineSyscall,
HooksMixin,
SimEngineUnicorn,
SuperFastpathMixin, # SuperFastpathMixin(VEXSlicingMixin)
TrackActionsMixin,
SimInspectMixin,
HeavyResilienceMixin,
SootMixin,
HeavyVEXMixin
):
pass

下面单步调试,看看 self.handle_vex_block() 都调用了哪些函数。

1. SuperFastpathMixin

这里并没有设置 o.SUPER_FASTPATH 标志位,因此跳过了。其实我也不太清楚这个类具体是在做什么。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class SuperFastpathMixin(VEXSlicingMixin):
"""
This mixin implements the superfastpath execution mode, which skips all but the last four instructions.
"""
def handle_vex_block(self, irsb):
# This option makes us only execute the last four instructions
if o.SUPER_FASTPATH in self.state.options:
imark_counter = 0
for i in range(len(irsb.statements) - 1, -1, -1):
if type(irsb.statements[i]) is pyvex.IRStmt.IMark:
imark_counter += 1
if imark_counter >= 4:
self._skip_stmts = max(self._skip_stmts, i)
break

super().handle_vex_block(irsb)

2. VEXSlicingMixin

SuperFastpathMixin 继承自这个类。它也定义了 handle_vex_block 函数。这里我也不是很懂,是 VEX 需要对基本块做切片吗?

1
2
3
4
5
6
7
8
class VEXSlicingMixin(VEXMixin):
... ...
def handle_vex_block(self, irsb):
self.__no_exit_sliced = not self._check_vex_slice(DEFAULT_STATEMENT) and \
not any(self._check_vex_slice(stmt_idx) \
for stmt_idx, stmt in enumerate(irsb.statements) \
if stmt.tag == 'Ist_Exit')
super().handle_vex_block(irsb)

3. TrackActionsMixin

根据类名猜想,应该是要记录一些操作信息吧。它也继承自 HeavyVEXMixin,看起来多次间接继承自同一个父类也没啥问题(可能吧

1
2
3
4
5
class TrackActionsMixin(HeavyVEXMixin):
... ...
def handle_vex_block(self, irsb):
self.__tmp_deps = {}
super().handle_vex_block(irsb)

4. SimInspectMixin

我开始有点无语了。这个继承关系让我很难找到哪部分是重点。这里使用了 state._inspect 方法,其实是 state 内注册了一个插件 SimInspector。根据注释来看,它负责断点监控数据。听起来挺好用的,回头研究一下这个插件的使用。这里我们暂时不关心它。

SimInspector – The breakpoint interface, used to instrument execution. For usage information, look here:
https://docs.angr.io/core-concepts/simulation#breakpoints

1
2
3
4
5
6
7
8
class SimInspectMixin(VEXMixin):
# open question: what should be done about the BP_AFTER breakpoints in cases where the engine uses exceptional control flow?
... ...
def handle_vex_block(self, irsb):
self.state._inspect('irsb', BP_BEFORE, address=irsb.addr)
super().handle_vex_block(irsb)
self.state._inspect('instruction', BP_AFTER)
self.state._inspect('irsb', BP_AFTER, address=irsb.addr)

重要的部分 VEXMixin

最后,super().handle_vex_block(irsb) 跳转到了 VEXMixin 中。代码如下,for 循环遍历 IRSB 中的每一条 statement,然后依次处理。

1
2
3
4
5
6
7
8
9
10
11
class VEXMixin(SimEngineBase):
... ...
def handle_vex_block(self, irsb: pyvex.IRSB):
self.irsb = irsb
self.tmps = [None]*self.irsb.tyenv.types_used

for stmt_idx, stmt in enumerate(irsb.statements):
self.stmt_idx = stmt_idx
self._handle_vex_stmt(stmt)
self.stmt_idx = DEFAULT_STATEMENT
self._handle_vex_defaultexit(irsb.next, irsb.jumpkind)

重点就是每条 stmt 是怎么执行的。

pyvex.stmt.IRStmt —— IRSB 中的每条语句

每一个 stmt 都是一个 pyvex.stmt.IRStmt 对象。IRStmt 类还有很多子类,用于更详细地描述不同的 stmt,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class IRStmt(VEXObject):
"""
IR statements in VEX represents operations with side-effects.
"""
... ...

class NoOp(IRStmt):
"""
A no-operation statement. It is usually the result of an IR optimization.
"""
... ...

class IMark(IRStmt):
"""
An instruction mark. It marks the start of the statements that represent a single machine instruction (the end of
those statements is marked by the next IMark or the end of the IRSB). Contains the address and length of the
instruction.
一个指示标记。 它标记代表单个机器指令的语句的开始(这些语句的结束由下一个 IMark 或 IRSB 的结束标记)。 包含指令的地址和长度。
"""
... ...

class Put(IRStmt):
"""
Write to a guest register, at a fixed offset in the guest state.
"""

对于我们的例子,基本块翻译成 IRSB 的结果为:

Untitled

可以看到,每条机器码会对应若干个 stmt。IMark 标记是没有具体意义的,它仅表示机器码的开始,用于将机器码和 VEX stmt 进行对应。

之前听说 angr 在符号执行的时候地址会无法对应,不过既然有 IMark 标记那为什么会出现这种情况?

_handle_vex_stmt() —— 执行每条语句

下面分析 self._handle_vex_stmt(stmt) 。同样的道理,他也调用了一堆我们不关心的 _handle_vex_stmt() 函数。这里不再详细列出来了,直接回到 VEXMixin 中处理。

text
1
VEXSlicingMixin -> SimInspectMixin -> VEXMixin

根据 stmt 的类型(即 tag 的类型)找到对应的合适的处理器,然后处理。

1
2
3
4
5
class VEXMixin(SimEngineBase):
... ...
def _handle_vex_stmt(self, stmt: pyvex.stmt.IRStmt):
handler = self._vex_stmt_handlers[stmt.tag_int]
handler(stmt)

self._vex_stmt_handlers 这个函数字典是在 __init_handler() 中初始化的。做法也很简单粗暴:从当前的 self 中寻找名为 _handle_vex_expr_xxxx 的对象,把它作为函数对象保存到函数字典里。这样做的好处是每一个 stmt handler 都可以被 UberEngine 继承的类实现并最终被调用。

1
2
3
4
5
6
7
8
9
10
11
12
13
class VEXMixin(SimEngineBase):
... ...
def __init_handlers(self):
self._vex_expr_handlers = [None]*pyvex.expr.tag_count
self._vex_stmt_handlers = [None]*pyvex.stmt.tag_count
for name, cls in vars(pyvex.expr).items():
if isinstance(cls, type) and issubclass(cls, pyvex.expr.IRExpr) and cls is not pyvex.expr.IRExpr:
self._vex_expr_handlers[cls.tag_int] = getattr(self, '_handle_vex_expr_' + name)
for name, cls in vars(pyvex.stmt).items():
if isinstance(cls, type) and issubclass(cls, pyvex.stmt.IRStmt) and cls is not pyvex.stmt.IRStmt:
self._vex_stmt_handlers[cls.tag_int] = getattr(self, '_handle_vex_stmt_' + name)
assert None not in self._vex_expr_handlers
assert None not in self._vex_stmt_handlers

而 VEXMixin 中也已经定义好了各种 stmt 应该如何处理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class VEXMixin(SimEngineBase):
... ...

def _handle_vex_expr_RdTmp(self, expr: pyvex.expr.RdTmp):
return self._perform_vex_expr_RdTmp(expr.tmp)
def _perform_vex_expr_RdTmp(self, tmp):
return self.tmps[tmp]

def _handle_vex_expr_Get(self, expr: pyvex.expr.Get):
return self._perform_vex_expr_Get(
self._handle_vex_const(pyvex.const.U32(expr.offset)),
expr.ty)
def _perform_vex_expr_Get(self, offset, ty, **kwargs):
return NotImplemented

这里也只是找到了执行 VEX statement 的入口。我更关心的问题其实是,执行一条 stmt 后,如何反映到内存和寄存器上,如何记录约束信息,以及 VEX IR 的变量如何和机器的寄存器对应起来。