【Clang】如何在 24 小时内编写一个 Checker？

Amjac

2022-11-13

工具

Clang

原文链接：https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf

其他参考链接：

https://clang-analyzer.llvm.org/checker_dev_manual.html

零、简介

Clang Static Analyzer 是一个漏洞发现工具，它是可扩展的，用户可以编写自定义的代码，称为 checker。本文中我们会介绍如何编写一个 checker。

该工具可以提供免费且快速的代码审计功能，能够在开发早起就发现错误。

一、问题引入

编译器能够发现的问题是有限的。考虑下面一段代码：

void workAndLog (bool WriteToLog) {
	int LogHandle;
	int ErrorId;

	if (WriteToLog)
		LogHandle = getHandle() ;

	ErrorId = work():
	if (!WriteToLog)
		logIt(LogHandle, ErrorId);  // 可能使用了未初始化的值
}

当 WriteToLog = false 时，LogHandle 未被初始化，但直接被 logIt() 函数使用了。这样的错误编译器发现不了。

因此我们需要静态分析：

探索程序中的每条可能的路径（路径敏感，环境敏感的分析，算法复杂度是指数级别，但是有边界）；
输出精确的结果
能够发现更多的漏洞（UAF，内存泄漏等）

再看一个例子，检查打开的文件是否在每条路径下最终都会被关闭：

void writeCharToLog(char *Data) {
	FILE *F = fopen("mylog. txt", "W");

	if (F != NULL) {

		if (!Data)
			return;  // 文件未关闭

		fputc(*Data, F) ;
		fclose(F);
	}

	return;
}

显然，如果文件成功打开，且 Data 指针为空，则函数返回时未关闭文件。

二、符号执行

和具体执行类似，但是符号执行可以借助符号变量探索程序中可能的每条路径。在执行过程中，收集每条路径上符号变量的约束信息，通过约束信息判断路径的可行性。

Untitled

checker 会参与到控制流图的构建过程中，并且可以通过创建下沉结点 (sink nodes) 来阻止后续的路径探索。checker 的本质是 Visitor：

checkPreStmt(const ReturnStmt *S, CheckerContext &C) const ：在返回语句执行之前执行；
checkPostCall(const CallEvent &Call, CheckerContext &C) const ：在函数调用完成后执行；
checkBind(SVal L, SVal R, const Stmt *S, CheckerContext &C) const ：由于处理语句而将值绑定到某位置时；

三、编写一个栗子

针对文件打开和关闭的操作，情况如下图：

Untitled

错误的场景有两种：

如果文件已经被关闭了，那么它不应该再被访问；
如果文件是打开的，那么它最终必须被关闭。

接下来实现对这两种错误场景的检查。

1. 定义文件描述符的状态

struct StreamState {
private:
  enum Kind { Opened, Closed } K;
  StreamState(Kind InK) : K(InK) { }

public:
  bool isOpened() const { return K == Opened; }
  bool isClosed() const { return K == Closed; }

  static StreamState getOpened() { return StreamState(Opened); }
  static StreamState getClosed() { return StreamState(Closed); }

  bool operator==(const StreamState &X) const {
    return K == X.K;
  }
  void Profile(llvm::FoldingSetNodeID &ID) const {
    ID.AddInteger(K);
  }
};

变量 K 表示当前文件描述符的状态（打开/关闭）。

Profile 函数是干嘛的？

Checker State 是 ProgramState 的一部分。

1 2	State = State->set<StreamMap>(FileDesc, StreamState::getOpened()); const StreamState *SS = State->get<StreamMap>(FileDesc);

这里有点看不懂了，查了一下官方文档：

analyzer core 会进行符号执行，符号执行会尝试探索每条可行的路径。这些被探索到的路径会以 ExplodedGraph 对象表示，ExplodedGraph 的每个结点都是 ExplodedNode 对象。ExplodedNode 对象由 ProgramPoint 和 ProgramState 组成。

ProgramPoint 表示当前状态处于 CFG 中对应的位置（也会记录该状态是为什么被创建的）；

ProgramState 表示抽象的程序状态，它包括环境（Environment，即源码表达式到符号变量的映射），存储（Store，从内存值到符号变量的映射），以及约束（GenericDataMap，符号变量上的约束）。

… …

Checkers 通常需要保存一些状态执行过程中的信息，这些信息可以绑定在 ProgramState。（这一点就很像编写 S2E 插件的时候，可以通过实现 PluginState 来在每个 State 上开辟一块空间，用于记录额外的信息）。如果 Checker 需要在 ProgramState 上记录自定义的信息，就需要使用下面的宏定义在 ProgramState 上添加条目。

顺着这里，我找了两条宏定义：

/* in clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h */

/// Declares a program state trait for type \p Type called \p Name, and
/// introduce a type named \c NameTy.
/// The macro should not be used inside namespaces.
#define REGISTER_TRAIT_WITH_PROGRAMSTATE(Name, Type)                           \
  namespace {                                                                  \
  class Name {};                                                               \
  using Name##Ty = Type;                                                       \
  }                                                                            \
  namespace clang {                                                            \
  namespace ento {                                                             \
  template <>                                                                  \
  struct ProgramStateTrait<Name> : public ProgramStatePartialTrait<Name##Ty> { \
    static void *GDMIndex() {                                                  \
      static int Index;                                                        \
      return &Index;                                                           \
    }                                                                          \
  };                                                                           \
  }                                                                            \

...  ...

/// The macro should not be used inside namespaces, or for traits that must
  /// be accessible from more than one translation unit.
  #define REGISTER_MAP_WITH_PROGRAMSTATE(Name, Key, Value) \
    REGISTER_TRAIT_WITH_PROGRAMSTATE(Name, \
                                     CLANG_ENTO_PROGRAMSTATE_MAP(Key, Value))

这样回过头就能看懂了示例代码了：

/* in SimpleStreamChecker.cpp */

/// The state of the checker is a map from tracked stream symbols to their
/// state. Let's store it in the ProgramState.
REGISTER_MAP_WITH_PROGRAMSTATE(StreamMap, SymbolRef, StreamState)

就是定义了一个名字为 StreamMap 的类，并定义了值的类型，这个值的类型是 pair，key 是 SymbolRef 类型，value 是 StreamState 。SymbolRef 类型起始就是 SymExpr 的指针，表示符号变量。

/* in clang/include/clang/StaticAnalyzer/Core/PathSensitive/SymExpr.h */
using SymbolRef = const SymExpr *;
... ...
/// Symbolic value. These values used to capture symbolic execution of
/// the program.
class SymExpr : public llvm::FoldingSetNode {
  virtual void anchor();

再回到 PDF 中的例子：

1
2
3

SymbolRef FileDesc = Call.getReturnValue().getAsSymbol();
ProgramStateRef State = C.getState();
State = State->set<StreamMap>(FileDesc, StreamState::getOpened());

简单来说就是，把函数调用的返回值作为符号变量，记录到 ProgramState 中。

2. 在 fopen 处进行检查

这里，SimpleStreamChecker 类实现了 checkPostCall 函数。PostCall 意味着该函数会在函数调用完成之后执行。

void SimpleStreamChecker::checkPostCall(const CallEvent &Call,
                                        CheckerContext &C) const {
  if (!Call.isGlobalCFunction("fopen"))
    return;

  if (!OpenFn.matches(Call))  // 这里是什么意思？
    return;

  // Get the symbolic value corresponding to the file handle.
  SymbolRef FileDesc = Call.getReturnValue().getAsSymbol();
  if (!FileDesc)
    return;  // 这里难道意思是 fopen 执行失败了，所以没返回值？

  // Generate the next transition (an edge in the exploded graph).
  ProgramStateRef State = C.getState();
  State = State->set<StreamMap>(FileDesc, StreamState::getOpened());
  C.addTransition(State); // 表示将新的结点添加到图中。
}

这里还有一个东西 CallDescription OpenFn, CloseFn; ，根据 SimpleStreamChecker 的构造函数，看起来是用来匹配函数名称的。这个如果后面要用到精确过滤某个类的某个方法时可能有用，这里先挖坑。

3. 在 fclose 处进行检查并报告错误

void SimpleStreamChecker::checkPreCall(const CallEvent &Call,
                                       CheckerContext &C) const {
  if (!Call.isGlobalCFunction("fclose"))
    return;

  if (!CloseFn.matches(Call))
    return;

  // Get the symbolic value corresponding to the file handle.
  SymbolRef FileDesc = Call.getArgSVal(0).getAsSymbol();
  if (!FileDesc)
    return;

  // Check if the stream has already been closed.
  ProgramStateRef State = C.getState();
  const StreamState *SS = State->get<StreamMap>(FileDesc);
  if (SS && SS->isClosed()) {
    reportDoubleClose(FileDesc, Call, C);
    return;
  }

  // Generate the next transition, in which the stream is closed.
  State = State->set<StreamMap>(FileDesc, StreamState::getClosed());
  C.addTransition(State);
}

显然，如果在调用 fclose 时 file 已经关闭了，那么就报告多次关闭错误。

void SimpleStreamChecker::reportDoubleClose(SymbolRef FileDescSym,
                                            const CallEvent &Call,
                                            CheckerContext &C) const {
  // We reached a bug, stop exploring the path here by generating a sink.
  ExplodedNode *ErrNode = C.generateErrorNode();
  // If we've already reached this node on another path, return.
  if (!ErrNode)
    return;

  // Generate the report.
  auto R = std::make_unique<PathSensitiveBugReport>(
      *DoubleCloseBugType, "Closing a previously closed file stream", ErrNode);
  R->addRange(Call.getSourceRange());
  R->markInteresting(FileDescSym);
  C.emitReport(std::move(R));
}

这里生成下沉结点被替换为了生成 ErrorNode。

4. 检查内存泄漏问题

这里引入新的概念：dead symbol，表示的是在该路径下再也不会被引用的变量。当一个变量成为 dead symbol 时，checker 会得到通知。

checkDeadSymbols 并没有被其他函数调用，应该是 Checker 的一个接口。

void SimpleStreamChecker::checkDeadSymbols(SymbolReaper &SymReaper,
                                           CheckerContext &C) const {
  ProgramStateRef State = C.getState();
  SymbolVector LeakedStreams;
  StreamMapTy TrackedStreams = State->get<StreamMap>();
  for (StreamMapTy::iterator I = TrackedStreams.begin(),
                             E = TrackedStreams.end(); I != E; ++I) {
    SymbolRef Sym = I->first;
    bool IsSymDead = SymReaper.isDead(Sym);

    // Collect leaked symbols.
    if (isLeaked(Sym, I->second, IsSymDead, State))
      LeakedStreams.push_back(Sym);

    // Remove the dead symbol from the streams map.
    if (IsSymDead)
      State = State->remove<StreamMap>(Sym);
  }

  ExplodedNode *N = C.generateNonFatalErrorNode(State);
  if (!N)
    return;
  reportLeaks(LeakedStreams, C, N);
}

isLeaked 函数如下：

static bool isLeaked(SymbolRef Sym, const StreamState &SS,
                     bool IsSymDead, ProgramStateRef State) {
  if (IsSymDead && SS.isOpened()) {
    // If a symbol is NULL, assume that fopen failed on this path.
    // A symbol should only be considered leaked if it is non-null.
    ConstraintManager &CMgr = State->getConstraintManager();
    ConditionTruthVal OpenFailed = CMgr.isNull(State, Sym);
    return !OpenFailed.isConstrainedTrue();
  }
  return false;
}

这里如果确认 Symbol Dead，且文件状态是打开的，为什么不能直接确认泄漏呢？因为有可能文件指针 Sym 被约束为 NULL 了。要先确认其有可能不为 NULL，再报告漏洞。

void SimpleStreamChecker::reportLeaks(ArrayRef<SymbolRef> LeakedStreams,
                                      CheckerContext &C,
                                      ExplodedNode *ErrNode) const {
  // Attach bug reports to the leak node.
  // TODO: Identify the leaked file descriptor.
  for (SymbolRef LeakedStream : LeakedStreams) {
    auto R = std::make_unique<PathSensitiveBugReport>(
        *LeakBugType, "Opened file is never closed; potential resource leak",
        ErrNode);
    R->markInteresting(LeakedStream);
    C.emitReport(std::move(R));
  }
}

四、测试这个栗子

这个 checker 在 clang 中已经自带了（至少 clang version 16.0.0 是这样的）。先写一个测试代码：

/* in /tmp/test_leak.c */
void writeCharToLog(char *Data) {
  FILE *F = fopen("mylog.txt", "w");
  if (F != NULL) {
    if (!Data)
      return;
    fclose(F);
  }
  return;
}

然后测试：

1	clang -cc1 -analyze -analyzer-checker=alpha.unix.SimpleStream -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/ /tmp/test_leak.c

因为找不到 stdio.h ，所以我手动添加了 include dir。

/tmp/test_leak.c:6:7: warning: Opened file is never closed; potential resource leak [alpha.unix.SimpleStream]
      return;
      ^~~~~~
22 warnings generated.

成功看到 test_leak.c 的报错。但是前面还有很多 warning，是 stdio.h 中的警告。