【工具】Pintools 的简单使用

Amjac

2021-06-30

工具

pintools

Pintools 的简单使用

零、简介

Pin 是 Intel 推出的一款动态二进制插桩工具。

一、下载安装

在[官网主页](Pin - A Binary Instrumentation Tool - Downloads (intel.com))下载 Pin 3.18 版本。

1	wget https://software.intel.com/sites/landingpage/pintool/downloads/pin-3.18-98332-gaebd7b1e6-gcc-linux.tar.gz

pin 不是开源软件，下载的就是可执行程序，能够直接使用。

二、入门

Pin 提供了一些自带的例子，在 source/tools/ManualExamples 目录下。Pin 也提供了用户指南。

基础信息（挖坑）

Pin 和 Pintools

把 Pin 视作一个 JIT 编译器。这个编译器的输入不是字节码，而是一个常规的可执行文件。Pin 可以打断可执行文件第一条指令的执行，并生成（或者”编译“）新的代码在第一条指令运行前执行。

概念上来说，插桩由两部分组成：

instrumentation (): 决定在哪里插入代码，以及插入什么代码
analysis: 在插入点执行代码

而 Pintools 可以被视作一个插件，决定了如何在 Pin 中修改插入的代码。

当新的代码需要被生成的时候，Pintool 会使用 Pin 来注册插桩回调例程 (routines)。这个插桩回调例程会检查要生成的代码，调查其静态属性，并决定是否要以及要在哪里插入对分析函数的调用。

分析函数（analysis function）收集有关应用程序的数据。 Pin 确保整数和浮点寄存器的状态在必要时得以保存和恢复，并允许将参数传递给函数。

Pintool 还可以为诸如线程创建或派生之类的事件注册通知回调例程。这些回调通常用于收集数据或工具初始化或清理。

由于 Pintool 像插件一样工作，因此它必须在与Pin和要检测的可执行文件相同的地址空间中运行。因此，Pintool 可以访问所有可执行文件的数据。它还与可执行文件共享文件描述符和其他进程信息。

Pin 和 Pintool 从第一条指令开始控制程序。对于使用共享库编译的可执行文件，这意味着动态加载程序和所有共享库的执行对于 Pintool 都是可见的。

插桩的粒度（Granularity）

四种插桩的模式：

Trace Instrumentation : 检查可执行文件的一条 trace，并针对一条 trace 插桩。

Trace 的定义

Traces usually begin at the target of a taken branch and end with an unconditional branch, including calls and returns. （虽然没看懂，但是听起来像是在描述一个基本块）

Pin 保证一条 trace 只有顶部一个入口，但是可能包含多个退出。（如果一个分支从 trace 的中间加入，那么 Pin 将构建一个以这个分支为起始的新的 trace ）

Pin 将 trace 拆分为基本块 (basic blocks, BBL)。对每条指令都执行一次 analysis call 可能开销太大了。所以我们可以将粒度放宽到基本块。使用 TRACE_AddInstrumentFunction() API 就行了。

然而由于 Pin 是动态分析可执行文件的控制流的，所以在 Pin 中看到的基本块可能与静态分析工具中看到的不一样。
Instruction Instrumentation : 可以检查可执行文件的每一条指令，并针对每一条指令插桩。
Image Instrumentation : 当一个 image 第一次加载时，检查并对整个 image 进行插桩。

A Pintool can walk the sections, SEC, of the image, the routines, RTN, of a section, and the instructions, INS of a routine. Instrumentation can be inserted so that it is executed before or after a routine is executed, or before or after an instruction is executed.

没完全懂。
Routine Instrumentation : 当一个 Image 第一次加载时，可以检查其中的 routine 并实施插桩。可以实现在 routine 执行之前或者之后执行分析代码。

Building the Example Tools

To build all examples in a directory for intel64 architecture:

1 2	$ cd source/tools/ManualExamples $ make all TARGET=intel64

简单的指令计数术功能（Instruction Instrumentation）

在执行每条指令之前，插入一个对 docount() 函数的调用。当程序退出时，会将计数结果保存到 inscount.out 中。

接下来是对 ls 应用指令计数：

$ ../../../pin -t obj-intel64/inscount0.so -- /bin/ls
Makefile          atrace.o     imageload.out  itrace      proccount
Makefile.example  imageload    inscount0      itrace.o    proccount.o
atrace            imageload.o  inscount0.o    itrace.out
$ cat inscount.out
Count 422838
$

VOID Instruction(INS ins, VOID *v)
{
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}

int main(int argc, char **argv)
{
		... ...
		INS_AddInstrumentFunction(Instruction, 0);
		... ...
}

INS_AddInstrumentFunction(Instruction, 0); 以指令级别的粒度将函数 Instruction() 指定为插桩函数。在这个函数内部决定了对于什么指令要执行什么分析代码。

指令地址追踪 (Instruction Instrumentation)

在前一个[例子](#简单的指令计数术功能（Instruction Instrumentation）)中，并未给 docount 函数传入任何参数。所以在这里展示如何传入参数。

这个例子是打印执行的每条指令的地址。

1	../../../pin -t obj-intel64/itrace.so -- /bin/ls

VOID printip(VOID *ip) { fprintf(trace, "%p\n", ip); }


VOID Instruction(INS ins, VOID *v)
{
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END);
}

所以实际上 INS_InsertCall() 是多态的。有多个重载函数。

内存引用追踪 (Instruction Instrumentation)

前面两个例子实际上对所有的指令都执行了插桩代码。有时候我们可能只是想对某一类指令做插桩，比如说内存操作指令或者分支指令。一些 API 的信息在这里提供。

在这个例子中，我们会展示如何在分析指令之后，选择性地进行插桩。这个工具生成了程序中所有内存地址引用的追踪。

我们只对读写内存的指令进行插桩。用函数 INS_InsertPredicatedCall() 来替换 INS_InsertCall() ，当指令谓词判断为 false 的时候避免插桩指令的执行。

比如说 REP 指令将会重复若干次，最后一次执行时谓词就是 False。其它常规的指令永远都是 True，因为他们都会被实际执行。

$ ../../../pin -t obj-intel64/pinatrace.so -- /bin/ls
Makefile          atrace.o    imageload.o    inscount0.o  itrace.out
Makefile.example  atrace.out  imageload.out  itrace       proccount
atrace            imageload   inscount0      itrace.o     proccount.o
$ head pinatrace.out
0x40001ee0: R 0xbfffe798
0x40001efd: W 0xbfffe7d4
0x40001f09: W 0xbfffe7d8
0x40001f20: W 0xbfffe864
0x40001f20: W 0xbfffe868
0x40001f20: W 0xbfffe86c
0x40001f20: W 0xbfffe870
0x40001f20: W 0xbfffe874
0x40001f20: W 0xbfffe878
0x40001f20: W 0xbfffe87c
$

VOID Instruction(INS ins, VOID *v)
{
    // Instruments memory accesses using a predicated call, i.e.
    // the instrumentation is called iff the instruction will actually be executed.
    //
    // On the IA-32 and Intel(R) 64 architectures conditional moves and REP 
    // prefixed instructions appear as predicated instructions in Pin.
    UINT32 memOperands = INS_MemoryOperandCount(ins);

    // Iterate over each memory operand of the instruction.
    for (UINT32 memOp = 0; memOp < memOperands; memOp++)
    {
        if (INS_MemoryOperandIsRead(ins, memOp))
        {
            INS_InsertPredicatedCall(
                ins, IPOINT_BEFORE, (AFUNPTR)RecordMemRead,
                IARG_INST_PTR,
                IARG_MEMORYOP_EA, memOp,
                IARG_END);
        }
        // Note that in some architectures a single memory operand can be 
        // both read and written (for instance incl (%eax) on IA-32)
        // In that case we instrument it once for read and once for write.
        if (INS_MemoryOperandIsWritten(ins, memOp))
        {
            INS_InsertPredicatedCall(
                ins, IPOINT_BEFORE, (AFUNPTR)RecordMemWrite,
                IARG_INST_PTR,
                IARG_MEMORYOP_EA, memOp,
                IARG_END);
        }
    }
}

UINT32 LEVEL_CORE::INS_MemoryOperandCount(INS ins)

returns : 返回（指令 ins 涉及到的）内存操作数的数量。

BOOL LEVEL_CORE::INS_MemoryOperandIsWritten(INS ins, UINT32 memopIdx)

returns : 如果内存操作数 memopIdx 是读操作使用的，则返回 True

检测 Image 的加载与卸载（Image Instrumentation）

$ ../../../pin -t obj-intel64/imageload.so -- /bin/ls
Makefile          atrace.o    imageload.o    inscount0.o  proccount
Makefile.example  atrace.out  imageload.out  itrace       proccount.o
atrace            imageload   inscount0      itrace.o     trace.out
$ cat imageload.out
Loading /bin/ls
Loading /lib/ld-linux.so.2
Loading /lib/libtermcap.so.2
Loading /lib/i686/libc.so.6
Unloading /bin/ls
Unloading /lib/ld-linux.so.2
Unloading /lib/libtermcap.so.2
Unloading /lib/i686/libc.so.6
$


VOID ImageLoad(IMG img, VOID *v)
{
    TraceFile << "Loading " << IMG_Name(img) << ", Image id = " << IMG_Id(img) << endl;
}

// Pin calls this function every time a new img is unloaded
// You can't instrument an image that is about to be unloaded
VOID ImageUnload(IMG img, VOID *v)
{
    TraceFile << "Unloading " << IMG_Name(img) << endl;
}

int main(int argc, char *argv[])
{
    ... ...
    // Register ImageLoad to be called when an image is loaded
    IMG_AddInstrumentFunction(ImageLoad, 0);

    // Register ImageUnload to be called when an image is unloaded
    IMG_AddUnloadFunction(ImageUnload, 0);
    ... ...
}

更高效的指令计数（Trace Instrumentation）

前面的[指令计数器](#简单的指令计数术功能（Instruction Instrumentation）)效率较低，所以我们使用 BBL 的粒度，在每次执行一个基本块时执行插桩代码，并计算这个基本块内部的的指令数量，而不是在每次执行一条指令是执行插桩代码。

例子位于 source/tools/ManualExamples/inscount1.cpp

VOID Trace(TRACE trace, VOID *v)
{
    // Visit every basic block  in the trace
    for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
    {
        // Insert a call to docount before every bbl, passing the number of instructions
        BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END);
    }
}

int main(int argc, char *argv[])
{
    ... ...
    // Register Instruction to be called to instrument instructions
    TRACE_AddInstrumentFunction(Trace, 0);
    ... ...
}

函数指令计数（Routine Instrumentation）

监视一个函数内部的信息。

$ ../../../pin -t obj-intel64/proccount.so -- /bin/grep proccount.cpp Makefile
proccount_SOURCES = proccount.cpp
$ head proccount.out
              Procedure           Image            Address        Calls Instructions
                  _fini       libc.so.6         0x40144d00            1           21
__deregister_frame_info       libc.so.6         0x40143f60            2           70
  __register_frame_info       libc.so.6         0x40143df0            2           62
              fde_merge       libc.so.6         0x40143870            0            8
            __init_misc       libc.so.6         0x40115824            1           85
            __getclktck       libc.so.6         0x401157f4            0            2
                 munmap       libc.so.6         0x40112ca0            1            9
                   mmap       libc.so.6         0x40112bb0            1           23
            getpagesize       libc.so.6         0x4010f934            2           26
$