前言

代码版本为 datadog-Agent 2021年12月9日的 c9ddf6e854276cfc91aabe76415a203e17de5355

datadog-agent产品架构

在该产品相关演讲PPT里，找到了产品介绍资料–Runtime Security Monitoring with eBPF ，里面不仅包含产品架构图，还有总结了当下快速变化的架构下，如何实现运行时安全监控收集应用级别与容器级别的上下文，实现安全事件感知的思路。

架构图如下：

工程视角分析

invoke是python的任务管理工具，在这个项目里重度依赖，需要提前安装好。

python管理编译

invoke agent.build是官方文档介绍中用来编译项目的命令。invoke --list会显示当前项目所有可使用的命令列表。该命令会读取tasks目录下python的脚本，读取可执行函数。
本文重点在于eBPF相关功能实现，故只列出eBPF相关命令

cfc4n@vmubuntu:~/project/datadog-agent$ invoke --list
Available tasks:

... 
system-probe.build                             Build the system_probe
system-probe.clang-format                      Format C code using clang-format
system-probe.clang-tidy                        Lint C code using clang-tidy
system-probe.generate-cgo-types
system-probe.generate-runtime-files
system-probe.nettop                            Build and run the `nettop` utility for testing
system-probe.object-files                      object_files builds the eBPF object files
system-probe.test                              Run tests on eBPF parts
...

通过阅读源码、查阅资料，确定核心的eBPF源码都在system-probe.build命令中生成，并编译成.o的eBPF字节码。这个命令，会调用tasks\system_probe.py中build函数，再调用build_object_files来生成所有probe的eBPF字节码。这些probe字节码按照业务类型来划分，可以分为网络、runtime两类。

网络probe编译

build_network_ebpf_files函数会调用clang命令，编译链接pkg/network/ebpf/c/prebuild目录下4个文件，生成对应的bc与o文件

dns.c
http.c
offset-guess.c
tracer.c

runtime probe编译

合并生成runtime-security.c

generate_runtime_files函数会调用go命令，合并生成runtime-security.c文件。
合并的命令是调用了go generate命令，选择tags参数为linux_bpf

go generate -mod=mod -tags linux_bpf ./pkg/collector/corechecks/ebpf/probe/oom_kill.go
go generate -mod=mod -tags linux_bpf ./pkg/collector/corechecks/ebpf/probe/tcp_queue_length.go
go generate -mod=mod -tags linux_bpf ./pkg/network/http/compile.go
go generate -mod=mod -tags linux_bpf ./pkg/network/tracer/compile.go
go generate -mod=mod -tags linux_bpf ./pkg/network/tracer/connection/kprobe/compile.go
go generate -mod=mod -tags linux_bpf ./pkg/security/probe/compile.go

在这些go文件的头部，有相应的go:generate指令，调用根目录的pkg/ebpf/include_headers.go来合并对应文件，并保存到pkg/ebpf/bytecode/build/runtime/目录下，生成的文件包括如下几个C文件。

conntrack.c
http.c
oom-kill.c
runtime-security.c
tcp-queue-length.c
tracer.c

同时，在pkg/ebpf/bytecode/runtime/目录下生成相应的.go文件。

编译链接eBPF字节码

bindata_files.extend(build_security_ebpf_files(ctx, build_dir=build_dir, parallel_build=parallel_build))

system_probe.py的613行会调用的build_security_ebpf_files函数，调用外部命令clang编译生产bc文件，并调用llc链接bc文件生产.o的字节码。

clang -D__KERNEL__ -DCONFIG_64BIT -D__BPF_TRACING__ -DKBUILD_MODNAME=\\"ddsysprobe\\" -Wno-unused-value -Wno-pointer-sign -Wno-compare-distinct-pointer-types -Wunused -Wall -Werror -include ./pkg/ebpf/c/asm_goto_workaround.h -O2 -emit-llvm -fno-stack-protector -fno-color-diagnostics -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-jump-tables -I./pkg/ebpf/c -I./pkg/security/ebpf/c -DUSE_SYSCALL_WRAPPER=0 -c \'./pkg/security/ebpf/c/prebuilt/probe.c\' -o \'./pkg/ebpf/bytecode/build/runtime-security.bc

bundle_files函数会对上面生成的文件一并进行编译链接。

go run github.com/shuLhan/go-bindata/cmd/go-bindata -tags 'ebpf_bindata' -split -pkg bindata -prefix 'pkg/.*/' -modtime 1 -o './pkg/ebpf/bytecode/bindata' './pkg/ebpf/bytecode/build/runtime-security.o' './pkg/ebpf/bytecode/build/runtime-security-syscall-wrapper.o'

probe 模块编译

go build -mod=mod -v -a -tags "kubeapiserver python zk ec2 npm apm consul orchestrator systemd process jetson cri zlib containerd jmx podman clusterchecks netcgo secrets docker gce etcd kubelet linux_bpf" -o ./bin/system-probe/system-probe -gcflags="" -ldflags="-X github.com/DataDog/datadog-agent/pkg/version.Commit=c9ddf6e85 -X github.com/DataDog/datadog-agent/pkg/version.AgentVersion=7.34.0-devel+git.124.c9ddf6e -X github.com/DataDog/datadog-agent/pkg/serializer.AgentPayloadVersion=v5.0.4 -X github.com/DataDog/datadog-agent/pkg/config.ForceDefaultPython=true -X github.com/DataDog/datadog-agent/pkg/config.DefaultPython=3 " github.com/DataDog/datadog-agent/cmd/system-probe

从编译参数可以看出，核心的probe功能都在 cmd/system-probe 这个包里，这个包会独立编译成一个进程，也就是说，我们分析的ebpf使用相关功能都在这个包里。

datadog的probe hook点

在cmd/system-probe包里，一共5个模块分别是

NetworkTracerModule ModuleName = "network_tracer"
OOMKillProbeModule ModuleName = "oom_kill_probe"
TCPQueueLengthTracerModule ModuleName = "tcp_queue_length_tracer"
SecurityRuntimeModule ModuleName = "security_runtime"
ProcessModule ModuleName = "process"

同时，上面编译生产的eBPF字节码文件中，只有6个，分别是

dns.o / dns-debug.o
http.o / http-debug.o
offset-guest.o / offset-guest-debug.o
runtime-security.o
runtime-security-syscell-wrapper.o
tracer.o /tracer-debug.o

从文件名来看，并不能与模块名一一匹配，那么，他们是如何加载的呢？

NetworkTracerModule模块

eBPF字节码加载

网络跟踪模块tracer.NewTracer初始化时，判断配置中是否启用运行时编译配置，若启用，则调用runtime.Tracer.Compile进行源码编译，应该是暂时没实现CO-RE，需要每个主机上进行一次编译。

如果没启用运行时编译，则调用了netebpf.ReadOffsetBPFModule来加载之前编译生成的offset-guess.o，如果启动了调试模式，则加载offset-guess-debug.o。

模块管理器初始化

newManager函数初始化manager.Manager结构体时，初始化了eBPF maps跟eBPF probe列表。

probe

probe列表覆盖TCP、UDP的链接创建、发送数据、关闭链接等事件（包含入口、出口），支持IPv4\IPv6，并针对linux kernel 4.7以上版本做了更多HOOK点。
从安全场景来看，建模所需数据，也满足网络后门的需求。

// pkg/network/ebpf/probes/probes.go
// InetCskListenStop traces the inet_csk_listen_stop system call (called for both ipv4 and ipv6)
InetCskListenStop ProbeName = "kprobe/inet_csk_listen_stop"

// TCPv6Connect traces the v6 connect() system call
TCPv6Connect ProbeName = "kprobe/tcp_v6_connect"
// TCPv6ConnectReturn traces the return value for the v6 connect() system call
TCPv6ConnectReturn ProbeName = "kretprobe/tcp_v6_connect"

// TCPSendMsg traces the tcp_sendmsg() system call
TCPSendMsg ProbeName = "kprobe/tcp_sendmsg"

// TCPSendMsgPre410 traces the tcp_sendmsg() system call on kernels prior to 4.1.0. This is created because
// we need to load a different kprobe implementation
TCPSendMsgPre410 ProbeName = "kprobe/tcp_sendmsg/pre_4_1_0"

// TCPSendMsgReturn traces the return value for the tcp_sendmsg() system call
// XXX: This is only used for telemetry for now to count the number of errors returned
// by the tcp_sendmsg func (so we can have a # of tcp sent bytes we miscounted)
TCPSendMsgReturn ProbeName = "kretprobe/tcp_sendmsg"

// TCPGetSockOpt traces the tcp_getsockopt() kernel function
// This probe is used for offset guessing only
TCPGetSockOpt ProbeName = "kprobe/tcp_getsockopt"

// SockGetSockOpt traces the sock_common_getsockopt() kernel function
// This probe is used for offset guessing only
SockGetSockOpt ProbeName = "kprobe/sock_common_getsockopt"

// TCPSetState traces the tcp_set_state() kernel function
TCPSetState ProbeName = "kprobe/tcp_set_state"

// TCPCleanupRBuf traces the tcp_cleanup_rbuf() system call
TCPCleanupRBuf ProbeName = "kprobe/tcp_cleanup_rbuf"
// TCPClose traces the tcp_close() system call
TCPClose ProbeName = "kprobe/tcp_close"
// TCPCloseReturn traces the return of tcp_close() system call
TCPCloseReturn ProbeName = "kretprobe/tcp_close"

// We use the following two probes for UDP sends
IPMakeSkb        ProbeName = "kprobe/ip_make_skb"
IP6MakeSkb       ProbeName = "kprobe/ip6_make_skb"
IP6MakeSkbPre470 ProbeName = "kprobe/ip6_make_skb/pre_4_7_0"

// UDPRecvMsg traces the udp_recvmsg() system call
UDPRecvMsg ProbeName = "kprobe/udp_recvmsg"
// UDPRecvMsgPre410 traces the udp_recvmsg() system call on kernels prior to 4.1.0
UDPRecvMsgPre410 ProbeName = "kprobe/udp_recvmsg/pre_4_1_0"
// UDPRecvMsgReturn traces the return value for the udp_recvmsg() system call
UDPRecvMsgReturn ProbeName = "kretprobe/udp_recvmsg"

// UDPDestroySock traces the udp_destroy_sock() function
UDPDestroySock ProbeName = "kprobe/udp_destroy_sock"
// UDPDestroySockrReturn traces the return of the udp_destroy_sock() system call
UDPDestroySockReturn ProbeName = "kretprobe/udp_destroy_sock"

// TCPRetransmit traces the return value for the tcp_retransmit_skb() system call
TCPRetransmit       ProbeName = "kprobe/tcp_retransmit_skb"
TCPRetransmitPre470 ProbeName = "kprobe/tcp_retransmit_skb/pre_4_7_0"

// InetCskAcceptReturn traces the return value for the inet_csk_accept syscall
InetCskAcceptReturn ProbeName = "kretprobe/inet_csk_accept"

// InetBind is the kprobe of the bind() syscall for IPv4
InetBind ProbeName = "kprobe/inet_bind"
// Inet6Bind is the kprobe of the bind() syscall for IPv6
Inet6Bind ProbeName = "kprobe/inet6_bind"

// InetBind is the kretprobe of the bind() syscall for IPv4
InetBindRet ProbeName = "kretprobe/inet_bind"
// Inet6Bind is the kretprobe of the bind() syscall for IPv6
Inet6BindRet ProbeName = "kretprobe/inet6_bind"

// SocketDnsFilter is the socket probe for dns
SocketDnsFilter ProbeName = "socket/dns_filter"

// SockMapFdReturn maps a file descriptor to a kernel sock
SockMapFdReturn ProbeName = "kretprobe/sockfd_lookup_light"

// IPRouteOutputFlow is the kprobe of an ip_route_output_flow call
IPRouteOutputFlow ProbeName = "kprobe/ip_route_output_flow"
// IPRouteOutputFlow is the kretprobe of an ip_route_output_flow call
IPRouteOutputFlowReturn ProbeName = "kretprobe/ip_route_output_flow"

// ConntrackHashInsert is the probe for new conntrack entries
ConntrackHashInsert ProbeName = "kprobe/__nf_conntrack_hash_insert"

// SockFDLookup is the kprobe used for mapping socket FDs to kernel sock structs
SockFDLookup ProbeName = "kprobe/sockfd_lookup_light"

// SockFDLookupRet is the kretprobe used for mapping socket FDs to kernel sock structs
SockFDLookupRet ProbeName = "kretprobe/sockfd_lookup_light"

// DoSendfile is the kprobe used to trace traffic via SENDFILE(2) syscall
DoSendfile ProbeName = "kprobe/do_sendfile"

// DoSendfileRet is the kretprobe used to trace traffic via SENDFILE(2) syscall
DoSendfileRet ProbeName = "kretprobe/do_sendfile"

maps

在初始化的map中，分为两类，一类是BPF_MAP_TYPE_PERF_EVENT_ARRAY的map，但只有ConnCloseEventMap BPFMapName = "conn_close_event"这一个。其他的都是别的类型map。

对于ConnCloseEventMap这个map，使用perfHandlerTCP函数来处理DataChannel和LostChannel两类事件。这里最终是把数据发送给每个网络客户端。

对于PerfEventArray类的map数据，使用dumpMapsHandler来统一处理事件，实现比较粗糙，直接打印出来。

整个模块中，内核态与用户态交互的map列表如下

// pkg/network/ebpf/probes/probes.go
// BPFMapName stores the name of the BPF maps storing statistics and other info
type BPFMapName string

const (
    ConnMap               BPFMapName = "conn_stats"
    TcpStatsMap           BPFMapName = "tcp_stats"
    ConnCloseEventMap     BPFMapName = "conn_close_event"
    TracerStatusMap       BPFMapName = "tracer_status"
    PortBindingsMap       BPFMapName = "port_bindings"
    UdpPortBindingsMap    BPFMapName = "udp_port_bindings"
    TelemetryMap          BPFMapName = "telemetry"
    ConnCloseBatchMap     BPFMapName = "conn_close_batch"
    ConntrackMap          BPFMapName = "conntrack"
    ConntrackTelemetryMap BPFMapName = "conntrack_telemetry"
    SockFDLookupArgsMap   BPFMapName = "sockfd_lookup_args"
    DoSendfileArgsMap     BPFMapName = "do_sendfile_args"
    SockByPidFDMap        BPFMapName = "sock_by_pid_fd"
    PidFDBySockMap        BPFMapName = "pid_fd_by_sock"
)

可以看到，不仅提供了网络的数据，还提供了网络产生的进程ID的数据，安全威胁建模上，可以更好的关联到进程数据了。

probe点

在完成eBPF字节码读取后，开始读取当前模块的probe点信息，当前模块启用的probe点包含TCP的链接创建、UDP的信息发送。在kernel大于4.7版本上，会对^ip6_make_skb$函数进行hook。并按照配置支持IPv4/IPv6两种IP模式。

datadog-agent使用了自研的DataDog/ebpf类库，基于系统调用封装的ebpf管理模块，实现了setion加载、probe加载处理、过滤，event map读取包等功能。

在加载字节码后，开始启动probe的初始化

配置激活的probes
修改eBPF常量
修改map spec
修改program maps
加载pinned maps和programs到内核
使用验证器加载eBPF program到内核

eBPF maps读取

网络跟踪模块的maps读取分三种，分为TCP链接关闭事件、TCP链接列表map、TCP状态map三种。
在pkg/network/tracer/connection/kprobe/tracer.go使用newMapIterator函数对eBPF map进行迭代读取。

在加载eBPF程序到内核后，

遍历PerfMaps，开协程读取perfbuf事件，使用前面设置的LostHandler和DataHandler两个函数来毁掉消费数据。
挂载probe点到系统，让系统生产事件，并发送到map。
检查probe的选择器
更新模块的Map属性路由规则
更新模块的program的路由规则

HTTP 接口

在向外提供数据里，除了提供了统计当前链接总数等常规需求外，还提供了debug需求的数据，提供当前模块的eBPF数据大盘数据。

同时，在HTTP接口的服务器启动是，会根据配置选择启动相关metrics的模块，加载相应tracer.o/tracer-debug.o、http.o/http-debug.o、dns.o/dns-debug.o、offset-guess.o/offset-guess-debug.o、

OOMKillProbe模块

该模块从名字上来看是用来hook linux系统上OOM(Out Of Memory)发生时相关进程名、次数等信息，业务功能上比较简单。

eBPF字节码加载

不支持编译好的.o字节码加载，直接即时编译。

compiledOutput, err := runtime.OomKill.Compile(cfg, nil)

编译过程跟用统一的封装函数，不在赘述。

模块管理器初始化

同理，map跟probe比较少。分别是oom_statsmap 与 kprobe/oom_kill_process probe。（笔者吐槽一下，这里的map名字是硬编码的字符串，跟网络跟踪模块NetworkTracer不一样，非常不统一，以及搭建分析过程中的各种起码做法，简直无法理解，这代码简直是当代屎山。）

数据读取

OOM数据是以HTTP接口方式，提供给客户端调用的，HTTP接口调用时，OOMKillProbe.GetAndFlush函数会读取oom_statsmap里的所有数据，并清空map。将结果返回给HTTP response。

Process 模块

当前模块名字上来看，是用于提供进程数据的接口，但从代码分析上来，并不涉及eBPF probe的HOOK。提供的HTTP接口，也只是根据HTTP request中的PIDS，遍历/proc/{PID}/目录获取相应的进程数据。

SecurityRuntime模块

模块管理器初始化

system-probe包的runCmd命令启动时，把包的所有模块都注册启动了。函数的调用顺序是

run(...)
->StartSystemProbe()
->api.StartServer(cfg)
->module.Register(cfg, mux, modules.All) //module是package的名字
->module.Register(router) //module是模块名字， module, err := factory.Fn(cfg)

调用每个probe 模块的Register方法，进行模块初始化、模块运行，比如SecurityRuntime模块，在pkg/security/module/module.go的71行

// Register the runtime security agent module
func (m *Module) Register(_ *module.Router) error {
    if err := m.Init(); err != nil {
        return err
    }

    return m.Start()
}

eBPF字节码加载

在Init方法中，对eBPF 的probe模块进行初始化

// initialize the eBPF manager and load the programs and maps in the kernel. At this stage, the probes are not
// running yet.
if err := m.probe.Init(m.statsdClient); err != nil {
    return errors.Wrap(err, "failed to init probe")
}

USE_SYSCALL_WRAPPER

在模块初始化第一步，读取/proc/kallsyms，搜索open函数的具体syscall函数名，以笔者AMD64 ubuntu 21.04为例，对应的函数名为__x64_sys_open。

如果该函数不包含SyS_或者sys_，则设定useSyscallWrapper变量为true。

编译或加载

根据配置信息中是否启用运行时编译，决定编译或者加载编译好的.o字节码文件。

若编译，则调用runtime.RuntimeSecurity.Compile编译，CLANG编译参数的cflags中增加DUSE_SYSCALL_WRAPPER=1参数。编译过程机制与网络模块一致，略过。

如果是加载预编译，根据useSyscallWrapper的值，选择runtime-security.o或者runtime-security-syscall-wrapper.o相应的版本。

笔者注：关于这参数，网上信息特别少，参见C library system-call wrappers, or the lack thereof和glibc syscall wrapper 内部实现

加载机制跟网络模块一样，走datadog/ebpf-manager包的Manager.InitWithOptions统一处理，不再赘述。

datadog/ebpf-manager基于github.com/cilium/ebpf包做了封装。

perfMap事件处理

handle的设定，与网络模块一样，在模块的managerOptions设置时，做了赋值

//pkg/security/probe/probe.go 160行
p.perfMap.PerfMapOptions = manager.PerfMapOptions{
    DataHandler: p.reOrderer.HandleEvent,
    LostHandler: p.handleLostEvents,
}

p.reOrderer.HandleEvent对perfMap处理时，增加了metric的统计信息，之后调用p.handleEvent来处理。该函数定义在 pkg/security/probe/probe.go中。

单向写入

在内核态中，所有probe hook的点产生的时间，都写入到events这个perfMap中。

普通Map管理

datadog-agent的用户态与内核态数据交互需求上，也实现也基于eBPF map的双向读写。

concurrent_syscalls
exec_count_bb
exec_count_fb
noisy_processes_fb
proc_cache
pid_cache

内核态读写

这些map在内核态会被写入数据，而在用户态会用于查找、删除操作。比如，进程创建时，触发SEC("kprobe/do_dentry_open") HOOK点后，内核态调用handle_exec_event函数会先查询proc_cache map是否有当前进程对应父进程的数据，之后再把自己进程信息存入proc_cache map。

用户态读

在用户态，也可以根据PID信息查找pid_cache map ，拿到cookie后，再到proc_cache map查询进程entry信息。

events的数据结构

在SecurityRuntime模块里，eBPF map用在内核态与用户态通讯时，使用统一封装的Event结构体，在eBPF 的内核态对应struct syscall_cache_t，包含多个事件类型的union struct。

events解析派发

用户态的go语言中，采用统一封装的大结构体Event解析内核态发来的数据。event.UnmarshalBinary读取包header信息，设定事件类型，再根据事件类型交付给Event的对应属性，对应解码方法解析消息内容。比如Event.Chmod.UnmarshalBinary来解码字节流。以下是Go中结构体的：

// pkg/security/secl/model/model.go line 118
type Event struct {
    ID           string    `field:"-"`
    Type         uint64    `field:"-"`
    TimestampRaw uint64    `field:"-"`
    Timestamp    time.Time `field:"timestamp"` // Timestamp of the event

    ProcessContext   ProcessContext   `field:"process" event:"*"`
    SpanContext      SpanContext      `field:"-"`
    ContainerContext ContainerContext `field:"container"`

    Chmod       ChmodEvent    `field:"chmod" event:"chmod"`             // [7.27] [File] A file’s permissions were changed
    Chown       ChownEvent    `field:"chown" event:"chown"`             // [7.27] [File] A file’s owner was changed
    Open        OpenEvent     `field:"open" event:"open"`               // [7.27] [File] A file was opened
    Mkdir       MkdirEvent    `field:"mkdir" event:"mkdir"`             // [7.27] [File] A directory was created
    Rmdir       RmdirEvent    `field:"rmdir" event:"rmdir"`             // [7.27] [File] A directory was removed
    Rename      RenameEvent   `field:"rename" event:"rename"`           // [7.27] [File] A file/directory was renamed
    Unlink      UnlinkEvent   `field:"unlink" event:"unlink"`           // [7.27] [File] A file was deleted
    Utimes      UtimesEvent   `field:"utimes" event:"utimes"`           // [7.27] [File] Change file access/modification times
    Link        LinkEvent     `field:"link" event:"link"`               // [7.27] [File] Create a new name/alias for a file
    SetXAttr    SetXAttrEvent `field:"setxattr" event:"setxattr"`       // [7.27] [File] Set exteneded attributes
    RemoveXAttr SetXAttrEvent `field:"removexattr" event:"removexattr"` // [7.27] [File] Remove extended attributes
    Exec        ExecEvent     `field:"exec" event:"exec"`               // [7.27] [Process] A process was executed or forked

    SetUID SetuidEvent `field:"setuid" event:"setuid"` // [7.27] [Process] A process changed its effective uid
    SetGID SetgidEvent `field:"setgid" event:"setgid"` // [7.27] [Process] A process changed its effective gid
    Capset CapsetEvent `field:"capset" event:"capset"` // [7.27] [Process] A process changed its capacity set

    SELinux SELinuxEvent `field:"selinux" event:"selinux"` // [7.30] [Kernel] An SELinux operation was run
    BPF     BPFEvent     `field:"bpf" event:"bpf"`         // [7.33] [Kernel] A BPF command was executed

    Mount            MountEvent            `field:"-"`
    Umount           UmountEvent           `field:"-"`
    InvalidateDentry InvalidateDentryEvent `field:"-"`
    ArgsEnvs         ArgsEnvsEvent         `field:"-"`
    MountReleased    MountReleasedEvent    `field:"-"`
}

这里实现了多event类型的统一处理，但各个event之间耦合严重，event类型增加减少都需要影响上层代码，从设计模式上来看，问题比较大。并且，大的结构体的实力化也带来较大的内存占用，冗余严重。这里设计很差。

在内核发送来的事件处理上，针对进程事件除了解析之外，还有额外调用ProcessResolver.AddExecEntry将进程创建数据缓存。便于HTTP接口获取当前主机进程的全量列表。

统计对账

在handleEvent处理事件的同时，调用了perfBufferMonitor.CountEvent对事件的类型、时间、次数、长度、CPU ID做了计数，用于事件完整率对账，同时间接的用于观测当前系统的事件量大小，用作评估系统繁忙的间接依据。

事件威胁风险评估

Event结构体属性修正完后，*Probe.DispatchEvent将这个事件派发出去，送给规则引擎做安全模型评估。
规则引擎的实现，在NewRuleSet中，根据配置信息，针对不同的事件类型选择不同的规则集合，分别进行风险威胁评估。如果事件类型没有规则，则直接放过。

//
模块的HandleEvent调用ruleSet.Evaluate来评估事件。这里在处理多个事件时，采用了对象池的做法来管理内存。ruleSet规则集合中的eventRuleBuckets属性是模块的包含所有事件类型的总规则集合。

每种事件类型对应的bucket包含多个rules []*Rule，评估时，遍历所有rules进行规则判断。

规则全匹配

如果命中规则，则将命中结果、命中规则信息、事件信息发送给规则的所有监听器(比如告警系统，关联事件处理器等)

规则部分匹配

如果没命中规则，则根据配置的每个字段，进行部分匹配判断，
若命中，则返回true。（但是，笔者看代码中，没发现有判断Evaluate函数的返回值）
若没命中，则准备丢弃，将事件与规则信息，发送给规则的所有监听器，以及发送到远程服务器，便于做规则调整。

风险检测规则引擎

实现上，是datadog自己做了一套AST语法分析的包，在pkg/security/secl/compiler下，识别的表达式，也是类似go语法的逻辑判断表达式。

规则配置时，只要根据上面Event的每个子event的属性来编写即可。比如

rules := []string{
        `open.filename == "/etc/passwd" && process.uid != 0`,
        `(open.filename =~ "/sbin/*" || open.filename =~ "/usr/sbin/*") && process.uid != 0 && open.flags & O_CREAT > 0`,
        `(open.filename =~ "/var/run/*") && open.flags & O_CREAT > 0 && process.uid != 0`,
        `(mkdir.filename =~ "/var/run/*") && process.uid != 0`,
    }

规则字符串看上去比较直观，容易理解。

项目中，提供了一个默认的规则模版，在仓库里的runtime/default.policy 下。

事件结果上报

在规则引擎判断完威胁后，不论结果如何，都仍会将结果异步发送至远程数据中心。
这些上报的数据中，包含完整的内核态发送来的原始数据，以及本地规则引擎命中信息，识别结果等数据。方便远程数据中心做大数据分析、建模、验证、调整检测规则，来提升召回率和准确率。

TCPQueueLength模块

该模块与OOMKillProbe模块类似，只支持本地编译字节码，不支持加载预先编译的.o字节码。业务功能上偏向于TCP的发送、接收包大小，与安全无关。加载机制上也与其他模块一直，故不再分析。

产品视角总结

datadog-agent支持包含windows、linux、IOT、android等多种系统平台，当然不同平台的支持程度不一样，本文的总结以支持eBPF的Linux为背景，给出这些总结。

业务特性区分不同map

产品中分别使用了*PerfMap对应BPF_MAP_TYPE_PERF_EVENT_ARRAY，这类业务对时间顺序有严格要求，事件读取后，会按照CPU时间排序。若对时间顺序无要求，则使用BPF_MAP_TYPE_ARRAY 、BPF_MAP_TYPE_LRU_HASH、BPF_MAP_TYPE_HASH等其他类型。

事件类型丰富

多个probe的字节码文件，支持包含系统运行时、网络、内核等事件的感知发送。包含

文件
1. 权限变更
2. 所有者变更
3. 打开
4. 目录创建
5. 改名
6. 软链
7. 查看时间
8. 别名
9. 设置属性
10. 移除属性
进程
1. 进程创建
2. UID设置
3. GID设置
4. cap设置
内核
1. SELinux命令
2. BPF调用
系统
1. 目录挂载
2. 目录卸载
3. ARGS/ENGS设置
4. OOM事件
网络事件（IPv4/IPv6）
1. TCP链接创建、发送数据、关闭链接
2. UDP链接创建、发送数据、关闭链接
3. SOCKET链接创建、发送数据、关闭链接

字段属性丰富

比如进程事件，包含

进程属性
进程文件系统属性
进程启动参数
环境变量
父进程信息（不全）

其他进程事件不在一一罗列，见pkg/security/secl/model/model.go内相关事件结构体。

多事件信息合并

在模块内部有部分事件的缓存，用于应对多个事件之间的关联，也可以用作本地规则引擎判断的数据补充。比如先设置环境变量，再执行进程创建命令。

配置识别判断规则远程下发

远程重新加载配置，以更新网络规则、安全事件识别规则。

多重事件风险判断

本地判断
1. 包含完整匹配
2. 部分匹配
远程判断、纠偏
本地判断更快的处理威胁场景，减轻远程数据中心处理压力。远程数据中心离线分析，威胁识别校验，感知新的威胁事件，优化改进识别规则。新规则快速应用到本地，提升检出效率。

事件数据管理

灵活注册事件接收器
事件数据监控（告警在远程服务器）
事件数据统计对账

linux多版本支持

不支持CO-RE，采用本地编译方式，依赖本地编译环境，不适合大型企业的HIDS场景。

以我司为例，内核版本分布相对清晰，版本种类不多，更适合预先编译相应字节码文件，HIDS Agent按照系统版本加载相应的字节码文件。

总结

datadog的system-probe模块使用了eBPF技术研发，实现了网络管理、运行时安全、系统状态感知等几个功能，支持运维、安全两个场景。其中核心模块在网络与运行时安全两个场景。故本文重点分析了这两个场景的技术实现与业务特性。

优点：

在上面产品视角分析中已经提到了。是一个功能、数据比较齐全的安全产品，数据满足各种入侵类型的建模，并且有多重安全模型分析机制，具备良好的事件上报、监控告警能力，具备数据统计、对账能力。

不足：

该产品以安全数据收集、检测为主，缺少安全防御功能。在已有功能中，没有看到限速限流、资源占用控制等功能。

在工程质量上吐槽点特别多，比如奇葩包的路径、几十种go package的go/C混编译、跨包修改变量、go generate/go build 乱调、混合C/C++/go/python多种语言、自定义N种go build tags、硬编码严重、部分功能耦合严重、扩展性差。这可能就是历经N年，多代程序员累计的shitcode屎山吧。

参考资料

CFC4N的博客由 CFC4N 创作，采用署名—非商业性使用—相同方式共享 4.0 进行许可。基于https://www.cnxct.com上的作品创作。转载请注明转自：datadog的eBPF安全检测机制分析

前言

datadog-agent产品架构

工程视角分析

python管理编译

网络probe编译

runtime probe编译

合并生成runtime-security.c

编译链接eBPF字节码

probe 模块编译

datadog的probe hook点

NetworkTracerModule模块

eBPF字节码加载

模块管理器初始化

probe

maps

probe点

eBPF maps读取

HTTP 接口

OOMKillProbe模块

eBPF字节码加载

模块管理器初始化

数据读取

Process 模块

SecurityRuntime模块

模块管理器初始化

eBPF字节码加载

USE_SYSCALL_WRAPPER

编译或加载

perfMap事件处理

单向写入

普通Map管理

内核态读写

用户态读

events的数据结构

events解析派发

统计对账

事件威胁风险评估

规则全匹配

规则部分匹配

风险检测规则引擎

事件结果上报

TCPQueueLength模块

产品视角总结

业务特性区分不同map

事件类型丰富

字段属性丰富

多事件信息合并

配置识别判断规则远程下发

多重事件风险判断

事件数据管理

linux多版本支持

总结

优点：

不足：

参考资料

安全

程序

运维

其他操作