Python-算法交易指南-全- - 绝不原创的飞龙

原文：annas-archive.org/md5/12b1d91f5aadf32df062ebaa2f4cc769
译者：飞龙
协议：CC BY-NC-SA 4.0

数据主义认为宇宙由数据流组成，任何现象或实体的价值取决于其对数据处理的贡献... 数据主义因此打破了动物（人类）和机器之间的障碍，并期望电子算法最终能够解码并超越生物化学算法。¹
尤瓦尔·赫拉利

发现在金融市场自动并成功交易的正确算法是金融界的圣杯。不久前，算法交易仅适用于资金雄厚、资产管理庞大的机构投资者。近年来，开源、开放数据、云计算和云存储以及在线交易平台的发展，使得小型机构和个体交易者也能参与其中，只需一台普通笔记本或台式电脑以及可靠的互联网连接就能开始进入这个迷人的领域。

如今，Python 及其强大包的生态系统是算法交易的首选技术平台。Python 允许您进行高效的数据分析（例如使用 pandas），用机器学习预测股市（例如使用 scikit-learn），甚至可以利用 Google 的深度学习技术（使用 TensorFlow）。

本书讨论了 Python 在算法交易中的应用，主要是在 生成α策略 的背景下（见第一章）。这样一本涉及两个广泛而激动人心领域交汇处的书籍，难以涵盖所有相关主题。但它可以深入探讨一系列重要的元主题。

这些话题包括：

金融数据

金融数据是每个算法交易项目的核心。Python 和像 NumPy 和 pandas 这样的包可以很好地处理和处理任何类型的结构化金融数据（日终、日内、高频）。

回测

在没有严格测试待部署交易策略的算法化自动交易的情况下是不可行的。本书涵盖了基于简单移动平均线、动量、均值回归以及基于机器/深度学习的预测的交易策略。

实时数据

算法交易需要处理实时数据，基于其进行在线算法，并实时可视化。本书介绍了使用 ZeroMQ 进行套接字编程和流式可视化。

在线平台

交易平台是进行交易的必要条件。本书介绍了两个流行的电子交易平台：Oanda 和 FXCM。

自动化

算法交易之美及一些主要挑战在于交易操作的自动化。本书展示了如何在云中部署 Python，以及如何建立适合自动算法交易的环境。

本书提供了独特的学习体验，具有以下特点和好处：

涵盖相关主题

这是唯一一本涵盖 Python 算法交易相关主题如此广泛和深入的书籍（详见以下）。

自包含的代码库

本书附带一个 Git 代码库，以自包含的可执行形式提供所有代码。该代码库可在 Quant 平台上获得。

实盘交易作为目标

两个不同在线交易平台的涵盖使读者能够高效地开始模拟和实盘交易。为此，本书为读者提供了相关、实用和有价值的背景知识。

自助和自主的学习方法

由于材料和代码是自包含的，仅依赖于标准的 Python 包，读者完全了解并掌握了正在发生的事情，如何使用代码示例，如何修改它们等。无需依赖第三方平台，例如进行回测或连接交易平台。有了本书，读者可以按自己的节奏做到这一切，并且可以掌控每一行代码。

用户论坛

虽然读者应该能够无缝跟进，但作者和 Python Quants 随时在 Quant 平台的用户论坛上帮助。读者可以随时发布问题和评论（账号免费）。

在线/视频培训（付费订阅）

Python Quants 提供全面的在线培训课程，利用本书中呈现的内容，并增加额外内容，涵盖重要主题如金融数据科学、金融中的人工智能、Python 用于 Excel 和数据库，以及其他 Python 工具和技能。

下面是每章节中呈现的主题和内容的快速概述。

第一章，Python 和算法交易

第一章介绍了算法交易的主题——即基于计算机算法自动交易金融工具。它讨论了这一背景下的基本概念，还涉及阅读本书的预期先决条件等内容。

第二章，Python 环境

本章为后续所有章节奠定了技术基础，展示了如何建立正确的 Python 环境。本章主要使用 conda 作为包和环境管理器。它通过 Docker 容器和云中部署 Python 进行演示。

第三章，处理金融数据

金融时间序列数据是每个算法交易项目的核心。本章向您展示如何从不同的公共数据和专有数据源中获取金融数据。还演示了如何使用 Python 高效存储金融时间序列数据。

第四章，掌握向量化回测

向量化是一种在一般数值计算和特别是金融分析中强大的方法。本章介绍了如何使用NumPy和pandas进行向量化，并将其应用于基于 SMA、动量和均值回归策略的回测。

第五章，利用机器学习预测市场走势

本章致力于通过机器学习和深度学习方法生成市场预测。主要依赖过去的回报观察作为特征，介绍了如何使用 Python 包Keras 结合 TensorFlow 和 scikit-learn 来预测明天的市场走向。

第六章，为事件驱动的回测构建类

虽然向量化回测在代码简洁性和性能方面具有优势，但在表示某些市场特征和交易策略方面存在限制。另一方面，通过面向对象编程技术实现的事件驱动回测可以对这些特征进行更精细和更现实的建模。本章详细介绍和解释了一个基础类以及用于长仓和多空交易策略回测的两个类。

第七章，处理实时数据和套接字

对于雄心勃勃的个人算法交易者，需要处理实时或流数据是现实。选择的工具是套接字编程，本章介绍了ZeroMQ 作为一种轻量级且可扩展的技术。该章还演示了如何利用Plotly 创建外观优美、交互式的流式图表。

第八章，使用 Oanda 进行 CFD 交易

Oanda 是一个外汇（Forex, FX）和差价合约（CFD）交易平台，提供多种可交易的工具，如基于外汇对、股票指数、商品或利率工具（基准债券）。本章指导如何使用 Python 封装包tpqoa 实现自动化算法交易策略。

第九章，使用 FXCM 进行外汇交易

FXCM 是另一个外汇和差价合约交易平台，最近推出了现代化的 RESTful API，用于算法交易。可用的工具涵盖多个资产类别，如外汇、股票指数或大宗商品。有一个 Python 包装器，可以基于 Python 代码进行算法交易，非常方便和高效 (http://fxcmpy.tpq.io)。

第十章，自动化交易操作

本章涉及资本管理、风险分析与管理，以及技术自动化算法交易操作中的典型任务。详细介绍了资本分配和杠杆的 Kelly 准则。

附录 A，Python, NumPy, matplotlib, pandas

附录提供了一个简明的介绍，涵盖了主要章节中介绍的 Python、NumPy和pandas的重要主题。这代表了一个起点，通过时间可以增加自己的 Python 知识。

图 P-1 展示了与算法交易相关的各层次，从底层到顶层的章节涵盖。必须从 Python 基础设施开始 (第二章)，然后添加金融数据 (第三章)、策略和向量化回测代码（第 4 和 5 章）。在此之前，数据集是作为整体使用和操作的。基于事件的回测首次引入了实际世界数据逐步到达的概念 (第六章)。它是通向连接代码层的桥梁，涵盖了套接字通信和实时数据处理 (第七章)。在此之上，交易平台及其 API 用于能够下单（第 8 和 9 章）。最后，涵盖了自动化和部署的重要方面 (第十章)。从这个意义上讲，本书的主要章节与在图 P-1 中看到的层次结构相关，为涵盖的主题提供了自然的顺序。

图 P-1. 算法交易的 Python 层次

本书适用于希望在算法交易这一迷人领域中应用 Python 的学生、学者和从业者。本书假设读者在 Python 编程和金融交易方面至少具有基本水平的背景知识。为了参考和复习，附录 A 介绍了重要的 Python、NumPy、matplotlib和pandas主题。以下是获得本书重要 Python 主题深入理解的良好参考资料。大多数读者将受益于至少可以访问 Hilpisch（2018）作为参考。关于应用于算法交易的机器和深度学习方法的背景信息，Hilpisch（2020）提供了大量背景信息和更多具体示例。关于 Python 在金融、金融数据科学和人工智能中的应用的背景信息可以在以下书籍中找到：

Hilpisch, Yves. 2018. Python for Finance: Mastering Data-Driven Finance. 2nd ed. Sebastopol: O’Reilly.
⸻. 2020. Artificial Intelligence in Finance: A Python-Based Guide. Sebastopol: O’Reilly.
McKinney, Wes. 2017. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. 2nd ed. Sebastopol: O’Reilly.
Ramalho, Luciano. 2021. Fluent Python: Clear, Concise, and Effective Programming. 2nd ed. Sebastopol: O’Reilly.
VanderPlas, Jake. 2016. Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol: O’Reilly.

算法交易的背景信息可以在以下书籍中找到：

Chan, Ernest. 2009. Quantitative Trading: How to Build Your Own Algorithmic Trading Business. Hoboken et al: John Wiley & Sons.
Chan, Ernest. 2013. Algorithmic Trading: Winning Strategies and Their Rationale. Hoboken et al: John Wiley & Sons.
Kissel, Robert. 2013. The Science of Algorithmic Trading and Portfolio Management. Amsterdam et al: Elsevier/Academic Press.
Narang, Rishi. 2013. Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading. Hoboken et al: John Wiley & Sons.

享受使用 Python 在算法交易世界中的旅程，并通过电子邮件联系 py4at@tpq.io 如果您有问题或意见。

本书中使用了以下排版约定：

斜体

指示新术语、网址、电子邮件地址、文件名和文件扩展名。

常量宽度

用于程序列表以及段落内部，用于引用程序元素，如变量或函数名称、数据库、数据类型、环境变量、语句和关键字。

常量宽度粗体

显示用户应直接输入的命令或其他文本。

常量宽度斜体

显示应由用户提供值或由上下文确定的值替换的文本。

这个元素表示一个提示或建议。

这个元素表示一般注释。

此元素表示警告或注意事项。

您可以在 Quant 平台上访问和执行附带本书的代码，只需免费注册即可访问https://py4at.pqp.io。

如果您有技术问题或在使用代码示例时遇到问题，请电邮至bookquestions@oreilly.com。

本书旨在帮助您完成工作。一般来说，如果本书提供示例代码，您可以在您的程序和文档中使用它。除非您复制了大量代码，否则无需征得我们的许可。例如，编写一个使用本书中几个代码片段的程序不需要许可。销售或分发奥莱利书籍中的示例代码需要许可。引用本书回答问题并引用示例代码不需要许可。将本书的大量示例代码整合到您产品的文档中需要许可。

如果您认为您使用的代码示例超出了公平使用范围或上述许可，请随时通过permissions@oreilly.com与我们联系。

40 多年来，奥莱利传媒 提供技术和商业培训，知识和见解，帮助公司取得成功。

我们独特的专家和创新者网络通过书籍、文章和我们的在线学习平台分享他们的知识和专业知识。奥莱利的在线学习平台为您提供按需访问直播培训课程、深度学习路径、互动编码环境，以及来自奥莱利和 200 多家其他出版商的大量文本和视频。获取更多信息，请访问http://oreilly.com。

请将关于本书的评论和问题发送至出版商：

奥莱利传媒公司
Gravenstein Highway North 1005
CA 95472 Sebastopol
800-998-9938（美国或加拿大）
707-829-0515（国际或本地）
707-829-0104（传真）

我们为本书设置了一个网页，列出勘误、示例和任何其他信息。您可以访问此页面https://oreil.ly/py4at。

电子邮件 bookquestions@oreilly.com 以发表评论或就本书提出技术问题。

有关我们的书籍和课程的新闻和信息，请访问http://oreilly.com。

在 Facebook 上找到我们：http://facebook.com/oreilly

在 Twitter 上关注我们：http://twitter.com/oreillymedia

在 YouTube 上关注我们：http://youtube.com/oreillymedia

我要感谢技术审阅者——Hugh Brown、McKlayne Marshall、Ramanathan Ramakrishnamoorthy 和 Prem Jebaseelan——他们提供了有益的评论，导致书中内容的许多改进。

像往常一样，特别感谢 Michael Schwed，他在技术方面支持我，无论是简单的还是极其复杂的问题，都凭借他广泛而深入的技术知识。

Python 金融计算和算法交易证书项目的代表也帮助改进了这本书。他们不断的反馈使我能够排除错误并改进在线培训课程中使用的代码和笔记本，现在，最终，在这本书中。

我还要感谢整个 O'Reilly Media 团队——特别是 Michelle Smith、Michele Cronin、Victoria DeRose 和 Danny Elfanbaum——因为他们让这一切成为可能，并在许多方面帮助我完善这本书。

当然，所有剩余的错误都是我自己的。

此外，我还要感谢 Refinitiv 团队——特别是 Jason Ramchandani——为提供持续支持和获取金融数据的机会而致谢。此书中使用的主要数据文件，并向读者提供的那些数据，都以某种方式从 Refinitiv 的数据 API 中获得。

献给我爱的家人。我把这本书献给我的父亲阿道夫，他对我和我们家庭的支持现在已经持续了近五十年。

¹ Harari, Yuval Noah. 2015. hom*o Deus: A Brief History of Tomorrow. London: Harvill Secker.

在高盛，从 2000 年的 600 人峰值下降到如今仅剩两人参与股票交易。¹
《经济学家》

本章为本书涵盖的主题提供背景信息和概述。虽然 Python 用于算法交易是 Python 编程和金融交叉领域的一个利基，但它是一个快速增长的领域，涉及诸如 Python 部署、交互式金融分析、机器和深度学习、面向对象编程、套接字通信、流数据可视化和交易平台等多样化的主题。

为了快速复习重要的 Python 主题，请先阅读附录 A。

Python 编程语言起源于 1991 年，由 Guido van Rossum 发布了标记为 0.9.0 的第一个版本。1994 年发布了版本 1.0。然而，Python 花了近二十年的时间才确立自己作为金融行业主要编程语言和技术平台。当然，早期采用者主要是对冲基金，但广泛应用可能直到大约 2011 年才开始。

金融行业采用 Python 的一个主要障碍是默认的 Python 版本 CPython 是一种解释型的高级语言。一般而言，数值算法，特别是金融算法，往往基于（嵌套的）循环结构。而像 C 或 C++ 这样的编译型低级语言在执行这些循环时非常快速，而 Python 则依赖解释而非编译，通常在执行上相当慢。因此，纯 Python 对于许多现实世界的金融应用，如期权定价或风险管理，速度过慢。

Python 与伪代码对比

尽管 Python 从未专门针对科学和金融社区，但许多人仍然喜欢其语法的美观和简洁性。不久之前，通常认为解释（金融）算法并同时呈现一些伪代码作为其适当技术实现的中间步骤是一个良好的传统。许多人觉得，使用 Python，伪代码步骤将不再必要。他们大多数情况下是正确的。

例如，考虑几何布朗运动的欧拉离散化，如方程 1-1 所示。

方程 1-1. 几何布朗运动的欧拉离散化

$S_{T} = S_{0} exp ((r - 0.5 σ^{2}) T + σ z \sqrt{T})$

几十年来，LaTeX 标记语言和编译器一直是编写包含数学公式的科学文档的黄金标准。在许多方面，当例如布置方程式时，LaTeX 语法与伪代码相似或已经像伪代码。在这种特定情况下，LaTeX 版本如下所示：

S_T = S_0 \exp((r - 0.5 \sigma²) T + \sigma z \sqrt{T})

在 Python 中，这意味着在给定相应变量定义的情况下转换为可执行代码，它也非常接近金融公式以及 Latex 表示：

S_T = S_0 * exp((r - 0.5 * sigma ** 2) * T + sigma * z * sqrt(T))

然而，速度问题仍然存在。作为相应随机微分方程的数值近似，这种差分方程通常用于通过蒙特卡洛模拟定价衍生品或基于模拟进行风险分析和管理。² 这些任务反过来可能需要完成数百万次模拟，通常需要在准实时或至少接近实时的时间内完成。作为解释型高级编程语言，Python 从未被设计成足够快速以处理这类计算要求极高的任务。

NumPy 和向量化

2006 年，由 Travis Oliphant 发布了NumPy Python 包的 1.0 版本。NumPy代表numerical Python，表明其针对数值要求严格的场景。Python 基本解释器在许多领域试图尽可能通用，这往往导致运行时开销相当大。³ 另一方面，NumPy采用专门化作为其主要方法，以避免开销并在特定应用场景中表现得尽可能出色和快速。

NumPy的主要类是常规数组对象，称为n 维数组对象的ndarray对象。它是不可变的，这意味着大小不能更改，只能容纳一种称为dtype的单一数据类型。这种专门化允许实现简洁快速的代码。在这种背景下的一个核心方法是向量化。基本上，该方法避免了在 Python 级别上的循环，并将循环委托给专门的NumPy代码，通常用 C 实现，因此速度相当快。

考虑按照 Equation 1-1 用纯 Python 模拟 1,000,000 个期末值 $S_{T}$ 。以下代码的主要部分是具有 1,000,000 次迭代的for循环：

In [1]: %%time import random from math import exp, sqrt S0 = 100 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) r = 0.05 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) T = 1.0 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) sigma = 0.2 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) values = [] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) for _ in range(1000000): ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) ST = S0 * exp((r - 0.5 * sigma ** 2) * T + sigma * random.gauss(0, 1) * sqrt(T)) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) values.append(ST) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) CPU times: user 1.13 s, sys: 21.7 ms, total: 1.15 s Wall time: 1.15 s

初始指数水平。

常数短期利率。

年份的时间跨度。

常数波动率因子。

用于收集模拟值的空list对象。

主要的for循环。

单期末值的模拟。

将模拟值附加到list对象上。

使用NumPy，你完全可以通过向量化避免在 Python 层面上进行循环。代码更为简洁、易读，并且速度大约快了 8 倍：

In [2]: %%time import numpy as np S0 = 100 r = 0.05 T = 1.0 sigma = 0.2 ST = S0 * np.exp((r - 0.5 * sigma ** 2) * T + sigma * np.random.standard_normal(1000000) * np.sqrt(T)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 375 ms, sys: 82.6 ms, total: 458 ms Wall time: 160 ms

这一行NumPy代码模拟所有值并将它们存储在一个ndarray对象中。

向量化是金融和算法交易中编写简洁、易读、易维护代码的强大概念。使用NumPy，向量化代码不仅使代码更简洁，而且可以显著加快代码执行速度（例如，在蒙特卡洛模拟中可以提高约 8 倍）。

可以说，NumPy在科学和金融领域中的成功有很大贡献。许多其他流行的 Python 包来自所谓的科学 Python 堆栈，构建在NumPy上作为高效的、执行数据结构来存储和处理数值数据。事实上，NumPy是SciPy包项目的一个延伸，提供科学中经常需要的丰富功能。SciPy项目意识到需要一个更强大的数值数据结构，将以前的项目如Numeric和NumArray整合成了一个新的统一形式，即NumPy。

在算法交易中，蒙特卡洛模拟可能不是编程语言的最重要用例。然而，如果涉足算法交易领域，管理更大或者说非常大的金融时间序列数据集是非常重要的用例。想象一下（股市内部的）交易策略的回测，或者在交易时间内处理 tick 数据流。这正是pandas数据分析包发挥作用的地方。

pandas 和 DataFrame 类

pandas的开发始于 2008 年，由当时在康涅狄格州格林尼治工作的 AQR 资本管理公司的 Wes McKinney 发起。与其他对冲基金一样，处理时间序列数据对 AQR 资本管理公司至关重要，但当时 Python 并没有提供对这类数据的吸引力支持。Wes 的想法是创建一个类似于 R 统计语言（http://r-project.org）在此领域功能的软件包。例如，这体现在主要类名DataFrame，其在 R 中的对应称为data.frame。由于未被认为与金融管理核心业务足够接近，AQR 资本管理公司于 2009 年开源了pandas项目，这标志着基于开源的数据和金融分析取得了重大成功。

部分原因是因为pandas，Python 已经成为数据和金融分析的主要力量。许多从其他语言转向 Python 的人都将pandas列为其决定的主要原因。与像Quandl这样的开放数据源相结合，pandas甚至允许学生使用最低的准入门槛进行复杂的金融分析：只需一台带有互联网连接的普通笔记本电脑即可。

假设一个算法交易员对交易比特币感兴趣，比特币是市值最高的加密货币。第一步可能是检索有关历史兑美元汇率的数据。使用 Quandl 数据和pandas，这样的任务在不到一分钟内就可以完成。图1-1 展示了以下 Python 代码的结果图，该代码（省略了一些与绘图样式相关的参数设置）仅有四行。虽然没有明确导入pandas，但是 Quandl Python 包默认返回一个DataFrame对象，然后用于添加 100 天的简单移动平均线（SMA），以及可视化原始数据和 SMA：

In [3]: %matplotlib inline from pylab import mpl, plt ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) plt.style.use('seaborn') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) mpl.rcParams['savefig.dpi'] = 300 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) mpl.rcParams['font.family'] = 'serif' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [4]: import configparser ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) c = configparser.ConfigParser() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) c.read('../pyalgo.cfg') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[4]: ['../pyalgo.cfg']In [5]: import quandl as q ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) q.ApiConfig.api_key = c['quandl']['api_key'] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) d = q.get('BCHAIN/MKPRU') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) d['SMA'] = d['Value'].rolling(100).mean() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) d.loc['2013-1-1':].plot(title='BTC/USD exchange rate', figsize=(10, 6)); ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)

导入并配置绘图包。

导入configparser模块并读取凭据。

导入 Quandl Python 包装器包并提供 API 密钥。

检索比特币兑美元汇率的每日数据，并返回具有单列的pandas DataFrame对象。

以向量化方式计算 100 天的 SMA。

从 2013 年 1 月 1 日开始选择数据并绘制。

显然，NumPy 和 pandas 在金融中显著促进了 Python 的成功。然而，Python 生态系统在解决基本问题和有时是专门问题的附加 Python 包方面还有很多可提供的。本书将使用用于数据检索和存储（例如，PyTables，TsTables，SQLite）以及用于机器和深度学习（例如，scikit-learn，TensorFlow）的包，仅举两个类别。在此过程中，我们还将实现类和模块，使任何算法交易项目更高效。但是，整个过程中主要使用的包将是NumPy和pandas。

图 1-1. 从 2013 年初到 2020 年中期的比特币兑美元历史汇率

虽然NumPy提供了存储数值数据和处理数据的基本数据结构，pandas则为时间序列管理带来了强大的能力。它还将其他包的功能封装成易于使用的 API。刚才描述的比特币示例显示，仅需在DataFrame对象上调用一个方法即可生成可视化显示两个金融时间序列的图表。与NumPy类似，pandas允许编写相当简洁的向量化代码，由于在底层大量使用编译代码，因此通常执行速度也非常快。

术语算法交易既不是唯一的也不是普遍定义的。在相当基本的层面上，它指的是基于某种正式算法进行金融工具交易。算法是一组操作（数学的、技术性的），按照一定顺序进行以达到特定目标。例如，有数学算法来解决魔方问题。⁴ 这样的算法可以通过逐步过程解决手头的问题，通常表现优异。另一个例子是用于寻找方程根的算法，如果根存在的话。在这种意义上，数学算法的目标通常是明确定义的，并且通常期望找到最优解。

但是财务交易算法的目标是什么呢？这个问题通常并不容易回答。暂时退后一步，考虑交易的一般动机可能有所帮助。在 Dorn 等人（2008）的文章中写道：

在金融市场中进行交易是一项重要的经济活动。交易是必要的，以进入和退出市场，将不需要的现金投入市场，并在需要时转换回现金。它们还需要在市场内移动资金，将一个资产交换为另一个资产，管理风险并利用关于未来价格变动的信息。

这里表达的观点更多地是技术性而非经济性质，主要集中在过程本身，而只有部分涉及为什么人们首先发起交易。对于我们的目的，个人和金融机构管理自己或他人资金的非详尽列表中包括以下财务交易动机：

贝塔交易

通过投资于例如复制标准普尔 500 指数表现的交易所交易基金（ETF）来赚取市场风险溢价。

Alpha 生成

通过例如卖空标准普尔 500 指数上的股票或 ETF 来赚取与市场无关的风险溢价。

静态对冲

通过购买例如标准普尔 500 指数的平值以外的看跌期权对冲市场风险。

动态对冲

通过例如动态交易标准普尔 500 指数期货和适当的现金、货币市场或利率工具对冲影响标准普尔 500 指数期权的市场风险。

资产负债管理

通过交易标准普尔 500 股票和 ETF 来覆盖因编写寿险政策而产生的责任。

做市商

例如，通过以不同买入和卖出价格买卖标准普尔 500 的期权来为期权提供流动性。

所有这些类型的交易可以通过自主方法实施，由人类交易员主要基于自身决策，同时也基于支持人类交易员或完全取代其决策过程的算法。在这种情况下，金融交易的计算机化当然起着重要作用。金融交易的开始阶段，通过大群人员在交易地点大声喊叫（“公开竞价”）进行交易是唯一的执行方式，但计算机化以及互联网和网络技术的出现彻底改变了金融行业的交易方式。本章开头的引文生动地说明了高盛在 2000 年和 2016 年参与股票交易活跃人数的变化。这是一个 25 年前就已经预见到的趋势，正如 Solomon 和 Corso（1991）所指出的：

计算机已经彻底改变了证券交易，股票市场目前正在进行动态转型。很明显，未来的市场将不再像过去的市场。
技术使得股票价格信息可以在几秒钟内传送到全球各地。目前，计算机将订单路由并直接从券商终端到交易所执行小额交易。计算机现在连接各种股票交易所，这种做法有助于创建证券交易的全球单一市场。技术的持续改进将使得通过电子交易系统全球执行交易成为可能。

有趣的是，动态对冲期权中使用的最古老和广泛使用的算法之一可以追溯到欧式期权定价的开创性论文由 Black 和 Scholes（1973）以及 Merton（1973）发表之时。这个算法被称为delta 对冲，甚至在计算机化和电子交易开始之前就已经存在。作为一个交易算法，delta 对冲展示了如何在一个简化的、完美的连续模型世界中对冲所有市场风险。在现实世界中，由于交易成本、离散交易、市场不完全流动性以及其他摩擦（“不完美性”），这个算法证明了其意想不到的有用性和稳健性。也许不能完美地消除影响期权的市场风险，但它在接近理想状态方面非常有用，并因此仍然大规模应用于金融行业。⁵

本书侧重于阿尔法生成策略的算法交易。尽管阿尔法有更复杂的定义，但在本书的目的下，阿尔法被视为一段时间内交易策略的回报与基准（单一股票、指数、加密货币等）回报之间的差异。例如，如果标准普尔 500 在 2018 年的回报为 10%，而算法策略的回报为 12%，那么阿尔法为+2 个百分点。如果策略回报为 7%，那么阿尔法为-3 个百分点。一般来说，这些数字不会根据风险进行调整，其他风险特征，如最大回撤（期间），通常被认为是二阶重要，如果有的话。

本书侧重于生成阿尔法的策略，即试图在市场表现独立之上（超过基准）生成正收益的策略。在本书中，阿尔法被定义为一种策略相对于基准金融工具表现的超额收益。

交易相关算法发挥重要作用的其他领域还有一些。其中之一是高频交易（HFT）领域，速度通常是参与者竞争的重点。⁶ HFT 的动机多种多样，但市场做市和阿尔法生成可能是主要角色。另一个领域是交易执行，在这个领域中，算法被部署以优化执行某些非标准交易。这个领域的动机可能包括以尽可能最佳价格执行大额订单，或者尽可能减少市场和价格冲击执行订单。更微妙的动机可能是通过在多个不同交易所执行订单来掩盖订单。

一个重要的问题仍有待解答：使用算法进行交易是否比人类研究、经验和自主性具有优势？这个问题很难以普遍性回答。确实，有些人类交易员和投资组合经理长期以来的平均表现超过了他们的投资者基准。在这方面的杰出例子是沃伦·巴菲特。另一方面，统计分析显示，大多数活跃投资组合经理很少连续击败相关基准。亚当·谢尔提到 2015 年时写道：

例如，去年，标准普尔 500 股票指数包括股息在内的总回报仅为 1.4%，而 66%的“主动管理”的大公司股票基金的回报低于该指数...从更长期来看，前景同样黯淡，84%的大市值基金在最近五年期间的回报低于标准普尔 500 指数，而 82%在过去十年中未能达到，该研究发现。⁷

在 2016 年 12 月发表的一项实证研究中，Harvey 等人写道：

我们分析并对比自主和系统化对冲基金的表现。系统化基金使用基于规则的策略，几乎没有或没有人类日常干预……我们发现，在 1996 年至 2014 年期间，系统化股票基金在未调整（原始）回报方面表现不及其自主对冲基金对手，但在调整为知名风险因子的情况下，风险调整后的表现类似。在宏观领域，系统化基金在未调整和风险调整的基础上均胜过自主基金。

表格1-0 复制了 Harvey 等人（2016）研究的主要定量发现。[⁸] 在表格中，因子包括传统因子（股票、债券等）、动态因子（价值、动量等）以及波动性（买入平价看涨和看跌期权）。调整后的回报评估比 将 alpha 除以调整后的回报波动率。有关更多详细信息和背景，请参阅原始研究。

研究结果表明，系统化（“算法化”）宏观对冲基金在未调整和风险调整的条件下表现最佳。它们在研究期间实现了年化 alpha 为 4.85%。这些是实施通常为全球性、跨资产、常涉及政治和宏观经济因素的策略的对冲基金。在调整后的回报评估比方面，系统化股票对冲基金仅击败了其自主对冲基金对手（0.35 对 0.25）。

	系统宏观	自主宏观	系统股票	自主股票
平均回报	5.01%	2.86%	2.88%	4.09%
因子归因回报	0.15%	1.28%	1.77%	2.86%
平均调整回报率（alpha）	4.85%	1.57%	1.11%	1.22%
调整后的回报波动率	0.93%	5.10%	3.18%	4.79%
调整后的回报评估比	0.44	0.31	0.35	0.25

与标准普尔 500 指数相比，2017 年对冲基金的整体表现相当疲弱。虽然标准普尔 500 指数回报率为 21.8%，但对冲基金仅为投资者带来了 8.5%的回报（参见 Investopedia 上的这篇文章）。这说明即使拥有数百万美元的研究和技术预算，要生成 alpha 也是多么困难。

Python 在金融行业的许多领域中被广泛使用，但在算法交易领域尤其受欢迎。这其中有几个很好的原因：

数据分析能力

每个算法交易项目的主要要求之一是能够有效管理和处理金融数据。Python，结合像 NumPy 和 pandas 这样的包，对每个算法交易员而言比大多数其他编程语言更加便利。

处理现代 API

现代在线交易平台，如 FXCM 和 Oanda 提供 RESTful 应用程序编程接口 (API) 和套接字 (流式) API 来访问历史和实时数据。Python 通常非常适合与这些 API 有效地交互。

专用软件包

除了标准的数据分析软件包外，还有多个专用于算法交易领域的软件包可用，例如用于交易策略回测的 PyAlgoTrade 和 Zipline，以及用于执行组合和风险分析的 Pyfolio。

供应商赞助的软件包

越来越多的供应商在该领域发布开源 Python 软件包，以便更容易地访问它们的产品。其中包括像 Oanda 这样的在线交易平台，以及像彭博和 Refinitiv 这样的领先数据提供商。

专用平台

例如，Quantopian 提供了一个标准化的基于 Web 的回测环境作为一个平台，人们可以在该平台上使用 Python 语言交换想法，并通过不同的社交网络功能与志同道合的人交流。从其成立到 2020 年，Quantopian 已经吸引了超过 30 万用户。

买方和卖方的采用

越来越多的机构参与者已经采用 Python 来简化他们交易部门的开发工作。这反过来又需要越来越多精通 Python 的员工，这使得学习 Python 成为一项值得的投资。

教育、培训和书籍

技术或编程语言的广泛采用的先决条件是学术和专业教育以及培训计划，结合专业书籍和其他资源。Python 生态系统最近在这些方面的提供上有了巨大的增长，教育和培训越来越多的人使用 Python 进行金融领域的工作。这有望强化 Python 在算法交易领域的采用趋势。

综上所述，可以相当肯定地说 Python 已经在算法交易中扮演着重要的角色，并且似乎有强劲的势头在未来变得更加重要。因此，对于任何试图进入这一领域的人来说，无论是作为雄心勃勃的“零售”交易员还是作为从事系统化交易的领先金融机构的专业人士，选择 Python 都是一个不错的选择。

本书的重点是将 Python 作为算法交易的编程语言。本书假设读者已经具有一些 Python 和用于数据分析的流行 Python 包的经验。好的入门书籍包括 Hilpisch（2018）、McKinney（2017）和 VanderPlas（2016），这些都可以用来建立 Python 在数据分析和金融领域的扎实基础。读者也应具有一些使用 Python 进行交互式分析的典型工具的经验，例如 IPython，VanderPlas（2016）也对其进行了介绍。

本书呈现并解释了应用于所讨论主题的 Python 代码，例如回测交易策略或处理流数据。它不能对不同地方使用的所有包提供全面的介绍。但它尝试着突出这些包的那些对于表达（例如 NumPy 的向量化）的核心能力。

本书也无法对算法交易相关的所有财务和运营方面进行全面介绍和概述。相反，该方法侧重于使用 Python 构建自动算法交易系统所需的基础设施。当然，大多数例子都来自算法交易领域。然而，当涉及动量或均值回归策略时，它们更多地被简单地使用，而没有提供（统计）验证或对其复杂性的深入讨论。每当适当时，都会提供参考资料，指向解决在表达过程中留下的问题的来源。

总的来说，本书是为那些既有 Python 又有（算法）交易经验的读者编写的。对于这样的读者，本书是使用 Python 和其他包构建自动化交易系统的实用指南。

本书采用了许多 Python 编程方法（例如，面向对象编程）和包（例如，scikit-learn），这些方法和包无法详细解释。重点是将这些方法和包应用于算法交易过程的不同步骤。因此，建议那些尚未具有足够 Python（金融）经验的人额外查阅更多入门 Python 文本。

本书中介绍了四种不同的算法交易策略作为示例。它们在以下各节中简要介绍，在第四章中有更详细的介绍。所有这些交易策略主要可以归类为alpha seeking strategies，因为它们的主要目标是产生正的，超出市场的回报，与市场方向无关。本书中的规范示例，涉及的金融工具交易，是股票指数、单只股票或加密货币（以法定货币计价）。本书不涵盖涉及多个金融工具同时进行的策略（配对交易策略、基于篮子的策略等）。它也仅涵盖其交易信号源自结构化的金融时间序列数据，而不是来自非结构化数据源如新闻或社交媒体资讯的策略。这使得讨论和 Python 实现更简洁易懂，符合（前面讨论过的）专注于 Python 算法交易的方法。⁹

本章的剩余部分快速概述了本书中使用的四种交易策略。

简单移动平均线

第一类交易策略依赖于简单移动平均线（SMAs）来生成交易信号和市场定位。这些交易策略被所谓的技术分析师或图表分析师广泛推广。其基本思想是，较短期的简单移动平均线高于较长期的简单移动平均线，信号长期市场定位，而相反情况信号中立或空头市场定位。

动量

动量策略背后的基本思想是，金融工具被假设会按照其最近的表现继续表现一段时间。例如，当股票指数在过去五天内平均表现为负回报时，假定其明天的表现也将是负的。

均值回归

在均值回归策略中，假设金融工具将在当前远离某个均值或趋势水平时恢复到该均值或趋势水平。例如，假设一支股票交易比其 100 天简单移动平均线低 10 美元。然后可以预期该股价将很快回归到其简单移动平均线水平。

机器学习和深度学习

使用机器学习和深度学习算法时，一般采用更黑箱的方法来预测市场走势。为了简单和可重复性起见，本书中的示例主要依赖历史回报观察作为特征，训练机器学习和深度学习算法来预测股票市场走势。

本书不以系统方式介绍算法交易。由于重点在于将 Python 应用于这一引人入胜的领域，对算法交易不熟悉的读者应查阅专门的资源。本章及后续章节中引用的某些资源是相关的。但请注意，算法交易世界总体上是秘密的，几乎每个成功的人都自然而然地不愿分享他们的成功秘诀，以保护他们的成功来源（即他们的 alpha）。

Python 在金融领域已经成为一股力量，并正朝着成为算法交易的主要力量迈进。使用 Python 进行算法交易有许多好处，其中包括强大的包生态系统，可实现高效的数据分析或处理现代 API 的能力。学习 Python 进行算法交易的原因也有很多，尤其是因为一些最大的买方和卖方机构在其交易操作中广泛使用 Python，并不断寻找经验丰富的 Python 专业人员。

本书侧重于将 Python 应用于算法交易的不同学科，如回测交易策略或与在线交易平台互动。它不能取代对 Python 本身或交易总体的彻底介绍。然而，它系统地结合了这两个引人入胜的世界，为当今竞争激烈的金融和加密货币市场中的 alpha 生成提供了宝贵的资源。

本章引用的书籍和论文：

Black, Fischer, and Myron Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” 《政治经济学杂志》 81 (3): 638-659.
Chan, Ernest. 2013. 《算法交易：获胜策略及其原理》. Hoboken et al: John Wiley & Sons.
Dorn, Anne, Daniel Dorn, and Paul Sengmueller. 2008. “Why Do People Trade?” 《应用金融杂志》 (秋/冬): 37-50.
Harvey, Campbell, Sandy Rattray, Andrew Sinclair, and Otto Van Hemert. 2016. “Man vs. Machine: Comparing Discretionary and Systematic Hedge Fund Performance.” 《投资组合管理杂志》 白皮书, Man Group.
Hilpisch, Yves. 2015. 《Python 金融衍生品分析：数据分析，模型，仿真，校准和套期保值》. Wiley Finance. 资源位于 http://dawp.tpq.io.
⸻. 2018. 《Python 金融分析：掌握数据驱动金融》. 第二版. Sebastopol: O’Reilly. 资源位于 https://py4fi.pqp.io.
⸻. 2020. 《金融中的人工智能：基于 Python 的指南》. Sebastopol: O’Reilly. 资源位于 https://aiif.pqp.io.
Kissel, Robert. 2013. 《算法交易与投资组合管理的科学》. Amsterdam et al: Elsevier/Academic Press.
Lewis, Michael. 2015. 《闪电少年：破解金融密码》. 纽约，伦敦: W.W. Norton & Company.
McKinney, Wes. 2017. Python 数据分析：使用 Pandas、NumPy 和 IPython 进行数据整理。第二版。Sebastopol：O’Reilly。
Merton, Robert. 1973. “理性期权定价理论。” 贝尔经济与管理科学杂志 4: 141-183。
Narang, Rishi. 2013. 黑盒子内部：量化和高频交易的简易指南。Hoboken 等地：John Wiley & Sons。
Solomon, Lewis, 和 Louise Corso. 1991. “技术对证券交易的影响：新兴的全球市场及其对监管的影响。” 约翰·马歇尔法学评论 24 (2): 299-338。
VanderPlas, Jake. 2016. Python 数据科学手册：处理数据的基本工具。Sebastopol：O’Reilly。

¹ “太蒸发无法失败。” 经济学人，2016 年 10 月 29 日。

² 详见 Hilpisch (2018, ch. 12)。

³ 例如，list 对象不仅是可变的（这意味着它们的大小可以改变），而且可以包含几乎任何其他类型的 Python 对象，如 int、float、tuple 对象或 list 对象本身。

⁴ 参见魔方的数学或解决魔方的算法。

⁵ 详见 Hilpisch (2015)，详细分析使用 Python 进行欧式和美式期权的对冲策略。

⁶ 参见 Lewis (2015) 的书籍，非技术性介绍高频交易。

⁷ 资料来源：“66%的基金经理无法匹配标准普尔 500 指数的成绩。” 今日美国，2016 年 3 月 14 日。

⁸ 年化表现（超过短期利率）和对包括从 1996 年 6 月到 2014 年 12 月期间的 9,000 只对冲基金的风险度量的总体基金类别的风险度量。

⁹ 参见 Kissel (2013) 的书籍，概述与算法交易相关的主题，参见 Chan (2013) 的书籍，深入讨论动量和均值回归策略，或参见 Narang (2013) 的书籍，全面覆盖量化和高频交易。

在建造房屋时，木材的选择是一个问题。
重要的是，木匠的目标是携带能够良好切割的设备，并在有时间时磨刀。
宫本武藏（《五轮书》）

对于新手来说，Python 部署可能看起来一切都不那么简单。对于可以选择安装的大量库和包，情况也是如此。首先，Python 不只有一种。Python 有许多不同的变体，如 CPython、Jython、IronPython 或 PyPy。然后还存在 Python 2.7 和 3.x 世界之间的分歧。本章重点介绍CPython，这是最流行的 Python 编程语言版本，以及版本 3.8。

即使专注于 CPython 3.8（以下简称“Python”），由于多种原因，部署也变得困难：

解释器（标准 CPython 安装）只带有所谓的标准库（例如，包括典型的数学函数）。
可选的 Python 包需要单独安装，而且有数百个这样的包。
编译（“构建”）这些非标准包可能会因为依赖关系和特定操作系统的要求而变得棘手。
处理这些依赖关系和长期版本一致性（维护）通常是乏味且耗时的。
特定包的更新和升级可能会导致需要重新编译大量其他包。
更改或替换一个包可能会在（许多）其他地方引起麻烦。
在稍后的某个时间点从一个 Python 版本迁移到另一个版本可能会放大所有前述问题。

幸运的是，有可用的工具和策略可以帮助解决 Python 部署问题。本章涵盖了以下几种技术类型，这些技术有助于 Python 部署：

包管理器

像pip或conda这样的包管理器帮助安装、更新和删除 Python 包。它们还有助于不同包的版本一致性。

虚拟环境管理器

像virtualenv或conda这样的虚拟环境管理器可以让你并行管理多个 Python 安装（例如，在单台机器上同时安装 Python 2.7 和 3.8，或者测试最新的 Python 包的开发版本而不会有风险）¹。

容器

Docker容器代表了包含运行特定软件所需的所有部件的完整文件系统，例如代码、运行时或系统工具。例如，您可以在运行 Mac OS 或 Windows 10 的机器上的 Docker 容器中运行一个带有 Python 3.8 安装和相应 Python 代码的 Ubuntu 20.04 操作系统。这样的容器化环境随后也可以在云中部署而无需进行任何重大更改。

云实例

部署用于金融应用的 Python 代码通常需要高可用性、安全性和性能。这些要求通常只能通过专业的计算和存储基础设施来满足，现在这些基础设施以非常有吸引力的条件提供，从相对小型到非常大型和强大的云实例都有。云实例（虚拟服务器）相对于长期租用的专用服务器的一个优势是，用户通常只需支付实际使用时间的费用。另一个优势是，这些云实例如果需要的话，可以在一两分钟内即可获得，这有助于敏捷开发和可伸缩性。

本章的结构如下。“包管理器 Conda”介绍了conda作为 Python 包管理器。“虚拟环境管理器 Conda”专注于conda在虚拟环境管理方面的功能。“使用 Docker 容器”简要介绍了作为容器化技术的 Docker，并侧重于构建一个基于 Ubuntu 的容器并安装 Python 3.8。“使用云实例”展示了如何在云中部署 Python 和Jupyter Lab，这是一个强大的基于浏览器的 Python 开发和部署工具套件。

本章的目标是通过专业的基础设施，安装适用于 Python 的各种工具，以及数值分析和可视化包，来完成正确的 Python 安装。这一组合随后将作为后续章节中 Python 代码实施和部署的基础，无论是交互式金融分析代码还是脚本和模块形式的代码。

虽然conda可以单独安装，但更高效的方法是通过Miniconda，一个包含conda作为包和虚拟环境管理器的最小 Python 发行版。

安装 Miniconda

您可以在Miniconda 页面下载不同版本的 Miniconda。以下假设为 Python 3.8 的 64 位版本，适用于 Linux、Windows 和 Mac OS。本小节的主要示例是在基于 Ubuntu 的 Docker 容器中进行的会话，该容器通过wget下载 Linux 64 位安装程序，然后安装 Miniconda。如所示的代码应该可以（也许需要进行轻微修改）在任何其他基于 Linux 或 Mac OS 的机器上运行：²

$ docker run -ti -h pyalgo -p 11111:11111 ubuntu:latest /bin/bashroot@pyalgo:/# apt-get update; apt-get upgrade -y...root@pyalgo:/# apt-get install -y gcc wget...root@pyalgo:/# cd rootroot@pyalgo:~# wget \> https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \> -O miniconda.sh...HTTP request sent, awaiting response... 200 OKLength: 93052469 (89M) [application/x-sh]Saving to: 'miniconda.sh'miniconda.sh 100%[============>] 88.74M 1.60MB/s in 2m 15s2020-08-25 11:01:54 (3.08 MB/s) - 'miniconda.sh' saved [93052469/93052469]root@pyalgo:~# bash miniconda.shWelcome to Miniconda3 py38_4.8.3In order to continue the installation process, please review the licenseagreement.Please, press ENTER to continue>>>

只需按下ENTER键即可开始安装过程。在查看许可协议后，通过回答yes来同意条款：

...Last updated February 25, 2020Do you accept the license terms? [yes|no][no] >>> yesMiniconda3 will now be installed into this location:/root/miniconda3 - Press ENTER to confirm the location - Press CTRL-C to abort the installation - Or specify a different location below[/root/miniconda3] >>>PREFIX=/root/miniconda3Unpacking payload ...Collecting package metadata (current_repodata.json): doneSolving environment: done## Package Plan ## environment location: /root/miniconda3... python pkgs/main/linux-64::python-3.8.3-hcff3b4d_0...Preparing transaction: doneExecuting transaction: doneinstallation finished.

当您同意许可条款并确认安装位置后，您应该再次回答yes来允许 Miniconda 将新的 Miniconda 安装位置添加到PATH环境变量中：

Do you wish the installer to initialize Miniconda3by running conda init? [yes|no][no] >>> yes...no change /root/miniconda3/etc/profile.d/conda.cshmodified /root/.bashrc==> For changes to take effect, close and re-open your current shell. <==If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false:conda config --set auto_activate_base falseThank you for installing Miniconda3!root@pyalgo:~#

之后，您可能希望更新conda，因为 Miniconda 安装程序通常不像conda本身那样定期更新：

root@pyalgo:~# export PATH="/root/miniconda3/bin/:$PATH"root@pyalgo:~# conda update -y conda...root@pyalgo:~# echo ". /root/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrcroot@pyalgo:~# bash(base) root@pyalgo:~#

在这个相对简单的安装过程之后，现在既有基本的 Python 安装，也有conda可用。基本的 Python 安装已经包含了一些不错的预装功能，比如SQLite3数据库引擎。您可以尝试在新的 Shell 实例中启动 Python，或者在附加相关路径到相应环境变量后（如前面的例子中所做的）启动 Python：

(base) root@pyalgo:~# pythonPython 3.8.3 (default, May 19 2020, 18:47:26)[GCC 7.3.0] :: Anaconda, Inc. on linuxType "help", "copyright", "credits" or "license" for more information.>>> print('Hello Python for Algorithmic Trading World.')Hello Python for Algorithmic Trading World.>>> exit()(base) root@pyalgo:~#

使用 Conda 的基本操作

conda可以高效地处理，包括安装、更新和删除 Python 包。以下列表提供了主要功能的概述：

安装 Python x.x

conda install python=x.x

更新 Python

conda update python

安装一个包

conda install $PACKAGE_NAME

更新一个包

conda update $PACKAGE_NAME

移除一个包

conda remove $PACKAGE_NAME

更新 conda 本身

conda update conda

搜索包

conda search $SEARCH_TERM

列出已安装的包

conda list

有了这些功能，例如安装NumPy（作为所谓的科学堆栈中最重要的包之一）只需要一个命令。当在装有 Intel 处理器的机器上安装时，该过程会自动安装Intel 数学核心库mkl，它不仅加速了NumPy在 Intel 机器上的数值运算，也为其他几个科学 Python 包提速：³

(base) root@pyalgo:~# conda install numpyCollecting package metadata (current_repodata.json): doneSolving environment: done## Package Plan ## environment location: /root/miniconda3 added / updated specs: - numpyThe following packages will be downloaded: package | build ---------------------------|----------------- blas-1.0 | mkl 6 KB intel-openmp-2020.1 | 217 780 KB mkl-2020.1 | 217 129.0 MB mkl-service-2.3.0 | py38he904b0f_0 62 KB mkl_fft-1.1.0 | py38h23d657b_0 150 KB mkl_random-1.1.1 | py38h0573a6f_0 341 KB numpy-1.19.1 | py38hbc911f0_0 21 KB numpy-base-1.19.1 | py38hfa32c7d_0 4.2 MB ------------------------------------------------------------ Total: 134.5 MBThe following NEW packages will be INSTALLED: blas pkgs/main/linux-64::blas-1.0-mkl intel-openmp pkgs/main/linux-64::intel-openmp-2020.1-217 mkl pkgs/main/linux-64::mkl-2020.1-217 mkl-service pkgs/main/linux-64::mkl-service-2.3.0-py38he904b0f_0 mkl_fft pkgs/main/linux-64::mkl_fft-1.1.0-py38h23d657b_0 mkl_random pkgs/main/linux-64::mkl_random-1.1.1-py38h0573a6f_0 numpy pkgs/main/linux-64::numpy-1.19.1-py38hbc911f0_0 numpy-base pkgs/main/linux-64::numpy-base-1.19.1-py38hfa32c7d_0Proceed ([y]/n)? yDownloading and Extracting Packagesnumpy-base-1.19.1 | 4.2 MB | ############################## | 100%blas-1.0 | 6 KB | ############################## | 100%mkl_fft-1.1.0 | 150 KB | ############################## | 100%mkl-service-2.3.0 | 62 KB | ############################## | 100%numpy-1.19.1 | 21 KB | ############################## | 100%mkl-2020.1 | 129.0 MB | ############################## | 100%mkl_random-1.1.1 | 341 KB | ############################## | 100%intel-openmp-2020.1 | 780 KB | ############################## | 100%Preparing transaction: doneVerifying transaction: doneExecuting transaction: done(base) root@pyalgo:~#

多个包也可以一次性安装。-y标志表示所有（可能的）问题都将回答为yes：

(base) root@pyalgo:~# conda install -y ipython matplotlib pandas \> pytables scikit-learn scipy...Collecting package metadata (current_repodata.json): doneSolving environment: done## Package Plan ## environment location: /root/miniconda3 added / updated specs: - ipython - matplotlib - pandas - pytables - scikit-learn - scipyThe following packages will be downloaded: package | build ---------------------------|----------------- backcall-0.2.0 | py_0 15 KB ... zstd-1.4.5 | h9ceee32_0 619 KB ------------------------------------------------------------ Total: 144.9 MBThe following NEW packages will be INSTALLED: backcall pkgs/main/noarch::backcall-0.2.0-py_0 blosc pkgs/main/linux-64::blosc-1.20.0-hd408876_0 ... zstd pkgs/main/linux-64::zstd-1.4.5-h9ceee32_0Downloading and Extracting Packagesglib-2.65.0 | 2.9 MB | ############################## | 100%...snappy-1.1.8 | 40 KB | ############################## | 100%Preparing transaction: doneVerifying transaction: doneExecuting transaction: done(base) root@pyalgo:~#

安装程序生成后，一些最重要的用于金融分析的库除了标准库外还可以使用：

改进的交互式 Python Shell

Python 的标准绘图库

高效处理数值数组

管理表格数据，如金融时间序列数据

Python 对HDF5库的封装

用于机器学习及相关任务的包

一组科学类和函数

这为一般数据分析和特别是金融分析提供了基本工具集。下一个例子使用IPython并使用NumPy生成一组伪随机数：

(base) root@pyalgo:~# ipythonPython 3.8.3 (default, May 19 2020, 18:47:26)Type 'copyright', 'credits' or 'license' for more informationIPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.In [1]: import numpy as npIn [2]: np.random.seed(100)In [3]: np.random.standard_normal((5, 4))Out[3]:array([[-1.74976547, 0.3426804 , 1.1530358 , -0.25243604], [ 0.98132079, 0.51421884, 0.22117967, -1.07004333], [-0.18949583, 0.25500144, -0.45802699, 0.43516349], [-0.58359505, 0.81684707, 0.67272081, -0.10441114], [-0.53128038, 1.02973269, -0.43813562, -1.11831825]])In [4]: exit(base) root@pyalgo:~#

执行conda list可以显示已安装的包：

(base) root@pyalgo:~# conda list# packages in environment at /root/miniconda3:## Name Version Build Channel_libgcc_mutex 0.1 mainbackcall 0.2.0 py_0blas 1.0 mklblosc 1.20.0 hd408876_0...zlib 1.2.11 h7b6447c_3zstd 1.4.5 h9ceee32_0(base) root@pyalgo:~#

如果不再需要某个包，可以使用conda remove高效移除：

(base) root@pyalgo:~# conda remove matplotlibCollecting package metadata (repodata.json): doneSolving environment: done## Package Plan ## environment location: /root/miniconda3 removed specs: - matplotlibThe following packages will be REMOVED:The following packages will be REMOVED: cycler-0.10.0-py38_0 ... tornado-6.0.4-py38h7b6447c_1Proceed ([y]/n)? yPreparing transaction: doneVerifying transaction: doneExecuting transaction: done(base) root@pyalgo:~#

作为一个包管理器，conda已经非常有用。然而，只有在将虚拟环境管理添加到混合中时，其全部力量才会显现出来。

作为一个包管理器，conda使得安装、更新和移除 Python 包变得愉快。不需要自行处理构建和编译包，这有时可能会很棘手，因为一个包指定的依赖列表和不同操作系统上需要考虑的细节。

安装了包括conda的 Miniconda 后，会根据所选择的 Miniconda 版本提供一个默认的 Python 安装。conda的虚拟环境管理功能允许用户在 Python 3.8 的默认安装基础上完全分离地安装 Python 2.7.x。为此，conda提供以下功能：

创建一个虚拟环境

conda create --name $ENVIRONMENT_NAME

激活一个环境

conda activate $ENVIRONMENT_NAME

停用一个环境

conda deactivate $ENVIRONMENT_NAME

移除一个环境

conda env remove --name $ENVIRONMENT_NAME

导出到环境文件

conda env export > $FILE_NAME

从文件创建一个环境

conda env create -f $FILE_NAME

列出所有环境

conda info --envs

作为一个简单的示例，接下来的示例代码创建了一个名为py27的环境，安装了IPython，并执行了一行 Python 2.7.x 的代码。尽管 Python 2.7 的支持已经结束，但这个示例说明了如何轻松执行和测试遗留的 Python 2.7 代码：

(base) root@pyalgo:~# conda create --name py27 python=2.7Collecting package metadata (current_repodata.json): doneSolving environment: failed with repodata from current_repodata.json,will retry with next repodata source.Collecting package metadata (repodata.json): doneSolving environment: done## Package Plan ## environment location: /root/miniconda3/envs/py27 added / updated specs: - python=2.7The following packages will be downloaded: package | build ---------------------------|----------------- certifi-2019.11.28 | py27_0 153 KB pip-19.3.1 | py27_0 1.7 MB python-2.7.18 | h15b4118_1 9.9 MB setuptools-44.0.0 | py27_0 512 KB wheel-0.33.6 | py27_0 42 KB ------------------------------------------------------------ Total: 12.2 MBThe following NEW packages will be INSTALLED: _libgcc_mutex pkgs/main/linux-64::_libgcc_mutex-0.1-main ca-certificates pkgs/main/linux-64::ca-certificates-2020.6.24-0 ... zlib pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3Proceed ([y]/n)? yDownloading and Extracting Packagescertifi-2019.11.28 | 153 KB | ############################### | 100%python-2.7.18 | 9.9 MB | ############################### | 100%pip-19.3.1 | 1.7 MB | ############################### | 100%setuptools-44.0.0 | 512 KB | ############################### | 100%wheel-0.33.6 | 42 KB | ############################### | 100%Preparing transaction: doneVerifying transaction: doneExecuting transaction: done## To activate this environment, use## $ conda activate py27## To deactivate an active environment, use## $ conda deactivate(base) root@pyalgo:~#

注意环境激活后提示符的变化，包含(py27)：

(base) root@pyalgo:~# conda activate py27(py27) root@pyalgo:~# pip install ipythonDEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020....Executing transaction: done(py27) root@pyalgo:~#

最后，这使得用户可以使用IPython并使用 Python 2.7 语法：

(py27) root@pyalgo:~# ipythonPython 2.7.18 |Anaconda, Inc.| (default, Apr 23 2020, 22:42:48)Type "copyright", "credits" or "license" for more information.IPython 5.10.0 -- An enhanced Interactive Python.? -> Introduction and overview of IPython's features.%quickref -> Quick reference.help -> Python's own help system.object? -> Details about 'object', use 'object??' for extra details.In [1]: print "Hello Python for Algorithmic Trading World."Hello Python for Algorithmic Trading World.In [2]: exit(py27) root@pyalgo:~#

正如这个示例所展示的，conda作为一个虚拟环境管理器，允许用户在同一台机器上安装不同的 Python 版本。它还允许用户安装不同版本的特定包。默认的 Python 安装不会受到这样的操作的影响，也不会影响同一机器上可能存在的其他环境。可以通过conda info --envs命令显示所有可用的环境：

(py27) root@pyalgo:~# conda env list# conda environments:#base /root/miniconda3py27 * /root/miniconda3/envs/py27(py27) root@pyalgo:~#

有时需要与他人共享环境信息或在多台机器上使用环境信息。为此，可以通过conda env export将已安装的包列表导出到文件中。然而，默认情况下，这仅适用于相同的操作系统，因为生成的yaml文件中指定了构建版本。可以通过--no-builds标志删除它们，只指定包的版本：

(py27) root@pyalgo:~# conda deactivate(base) root@pyalgo:~# conda env export --no-builds > base.yml(base) root@pyalgo:~# cat base.ymlname: basechannels: - defaultsdependencies: - _libgcc_mutex=0.1 - backcall=0.2.0 - blas=1.0 - blosc=1.20.0 ... - zlib=1.2.11 - zstd=1.4.5prefix: /root/miniconda3(base) root@pyalgo:~#

通常，虚拟环境仅仅是一个特定的（子）文件夹结构，用于进行一些快速测试。在这种情况下，环境可以通过conda env remove命令轻松地（在停用后）移除：

(base) root@pyalgo:~# conda env remove -n py27Remove all packages in environment /root/miniconda3/envs/py27:(base) root@pyalgo:~#

这就完成了conda作为虚拟环境管理器的概述。

conda 不仅帮助管理软件包，还是 Python 的虚拟环境管理器。它简化了创建不同 Python 环境的过程，允许在同一台机器上拥有多个 Python 版本和可选包，而彼此之间不会相互影响。conda 还允许将环境信息导出，以便在多台机器上轻松复制或与他人分享。

Docker 容器已经在 IT 领域掀起了风暴（参见Docker）。尽管技术还相对年轻，但已经确立了作为几乎任何软件应用高效开发和部署的标准之一。

对于我们的目的，可以将 Docker 容器简单地理解为一个独立的文件系统，其中包括操作系统（例如，用于服务器的 Ubuntu 20.04 LTS）、一个（Python）运行时、额外的系统和开发工具，以及根据需要的其他（Python）库和包。这样的 Docker 容器可以在本地运行的 Windows 10 专业版 64 位机器上，也可以在带有 Linux 操作系统的云实例上运行。

本节详细介绍了 Docker 容器的精彩细节。它简要说明了 Docker 技术在 Python 部署背景下的应用[⁵]。

Docker 镜像和容器

在进入说明之前，谈论 Docker 时需要区分两个基本术语。第一个是Docker 镜像，可以类比为 Python 类。第二个是Docker 容器，可以类比为相应 Python 类的实例。

在更技术层面上，您将在Docker 术语表中找到Docker 镜像的以下定义：

Docker 镜像是容器的基础。镜像是一组有序的根文件系统更改及其在容器运行时使用的执行参数。镜像通常包含一组层式文件系统的联合，依次堆叠在一起。镜像没有状态，永远不会改变。

类似地，在Docker 术语表中，您将找到Docker 容器的以下定义，这使得它与 Python 类和这些类的实例之间的类比更加透明：

容器是 Docker 镜像的运行时实例。
Docker 容器包括
一个 Docker 镜像
一个执行环境
一组标准指令
这个概念借鉴了集装箱运输，它定义了全球货物运输的标准。Docker 定义了软件运输的标准。

根据操作系统的不同，安装 Docker 也有所不同。因此，本节不涉及各自的详细信息。有关更多信息和进一步链接，请参阅获取 Docker 页面。

构建 Ubuntu 和 Python Docker 镜像

这个小节展示了基于最新版本的 Ubuntu 构建 Docker 镜像的过程，包括 Miniconda 和一些重要的 Python 包。此外，它通过更新 Linux 软件包索引、根据需要升级软件包并安装某些额外的系统工具来进行一些 Linux 基础工作。为此，需要两个脚本。一个是在 Linux 级别上执行所有工作的 Bash 脚本。另一个是所谓的 Dockerfile，控制镜像本身的构建过程。

示例2-1 中的 Bash 脚本负责安装，它包括三个主要部分。第一部分处理 Linux 的基础工作。第二部分安装 Miniconda，而第三部分安装可选的 Python 包。还有更详细的内联注释：

示例 2-1. 安装 Python 和可选包的脚本

#!/bin/bash## Script to Install# Linux System Tools and# Basic Python Components## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH## GENERAL LINUXapt-get update # updates the package index cacheapt-get upgrade -y # updates packages# installs system toolsapt-get install -y bzip2 gcc git # system toolsapt-get install -y htop screen vim wget # system toolsapt-get upgrade -y bash # upgrades bash if necessaryapt-get clean # cleans up the package index cache# INSTALL MINICONDA# downloads Minicondawget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O \ Miniconda.shbash Miniconda.sh -b # installs itrm -rf Miniconda.sh # removes the installerexport PATH="/root/miniconda3/bin:$PATH" # prepends the new path# INSTALL PYTHON LIBRARIESconda install -y pandas # installs pandasconda install -y ipython # installs IPython shell# CUSTOMIZATIONcd /root/wget http://hilpisch.com/.vimrc # Vim configuration

示例2-2 中的 Dockerfile 使用示例2-1 中的 Bash 脚本来构建新的 Docker 镜像。它还在内联中注释了其主要部分：

示例 2-2. Dockerfile 用于构建镜像

## Building a Docker Image with# the Latest Ubuntu Version and# Basic Python Install## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH## latest Ubuntu versionFROM ubuntu:latest# information about maintainerMAINTAINER yves# add the bash scriptADD install.sh /# change rights for the scriptRUN chmod u+x /install.sh# run the bash scriptRUN /install.sh# prepend the new pathENV PATH /root/miniconda3/bin:$PATH# execute IPython when container is runCMD ["ipython"]

如果这两个文件位于同一个文件夹中，并且已安装 Docker，则构建新的 Docker 镜像非常简单。在这里，标签 pyalgo:basic 用于这个镜像的引用。例如，在基于它运行容器时需要这个标签：

(base) pro:Docker yves$ docker build -t pyalgo:basic .Sending build context to Docker daemon 4.096kBStep 1/7 : FROM ubuntu:latest ---> 4e2eef94cd6bStep 2/7 : MAINTAINER yves ---> Running in 859db5550d82Removing intermediate container 859db5550d82 ---> 40adf11b689fStep 3/7 : ADD install.sh / ---> 34cd9dc267e0Step 4/7 : RUN chmod u+x /install.sh ---> Running in 08ce2f46541bRemoving intermediate container 08ce2f46541b ---> 88c0adc82cb0Step 5/7 : RUN /install.sh ---> Running in 112e70510c5b...Removing intermediate container 112e70510c5b ---> 314dc8ec5b48Step 6/7 : ENV PATH /root/miniconda3/bin:$PATH ---> Running in 82497aea20bdRemoving intermediate container 82497aea20bd ---> 5364f494f4b4Step 7/7 : CMD ["ipython"] ---> Running in ff434d5a3c1bRemoving intermediate container ff434d5a3c1b ---> a0bb86daf9adSuccessfully built a0bb86daf9adSuccessfully tagged pyalgo:basic(base) pro:Docker yves$

现有的 Docker 镜像可以通过 docker images 命令列出。新镜像应该位于列表的顶部：

(base) pro:Docker yves$ docker imagesREPOSITORY TAG IMAGE ID CREATED SIZEpyalgo basic a0bb86daf9ad 2 minutes ago 1.79GBubuntu latest 4e2eef94cd6b 5 days ago 73.9MB(base) pro:Docker yves$

成功构建了 pyalgo:basic 镜像后，可以使用 docker run 命令来运行相应的 Docker 容器。参数组合 -ti 对于在 Docker 容器内运行交互式进程（比如 IPython 的 shell 进程）是必需的（参见 Docker Run 参考页面）：

(base) pro:Docker yves$ docker run -ti pyalgo:basicPython 3.8.3 (default, May 19 2020, 18:47:26)Type 'copyright', 'credits' or 'license' for more informationIPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.In [1]: import numpy as npIn [2]: np.random.seed(100)In [3]: a = np.random.standard_normal((5, 3))In [4]: import pandas as pdIn [5]: df = pd.DataFrame(a, columns=['a', 'b', 'c'])In [6]: dfOut[6]: a b c0 -1.749765 0.342680 1.1530361 -0.252436 0.981321 0.5142192 0.221180 -1.070043 -0.1894963 0.255001 -0.458027 0.4351634 -0.583595 0.816847 0.672721

退出 IPython 将同时退出容器，因为它是容器内唯一运行的应用程序。但是，您可以通过以下方式从容器分离：

Ctrl+p --> Ctrl+q

在从容器分离后，docker ps 命令显示运行中的容器（以及可能的其他当前运行容器）：

(base) pro:Docker yves$ docker psCONTAINER ID IMAGE COMMAND CREATED ... NAMESe93c4cbd8ea8 pyalgo:basic "ipython" About a minute ago jolly_rubin(base) pro:Docker yves$

通过 docker attach $CONTAINER_ID 来附加到 Docker 容器。请注意，CONTAINER ID 的几个字母就足够了：

(base) pro:Docker yves$ docker attach e93cIn [7]: df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 5 entries, 0 to 4Data columns (total 3 columns): # Column Non-Null Count Dtype--- ------ -------------- ----- 0 a 5 non-null float64 1 b 5 non-null float64 2 c 5 non-null float64dtypes: float64(3)memory usage: 248.0 bytes

exit 命令终止 IPython 并因此停止 Docker 容器。可以通过 docker rm 命令移除它：

In [8]: exit(base) pro:Docker yves$ docker rm e93ce93c(base) pro:Docker yves$

类似地，如果不再需要，可以通过 docker rmi 移除 Docker 镜像 pyalgo:basic。虽然容器比较轻量级，但单个镜像可能会占用大量存储空间。对于 pyalgo:basic 镜像，其大小接近 2 GB。因此，您可能希望定期清理 Docker 镜像列表：

(base) pro:Docker yves$ docker rmi a0bb86Untagged: pyalgo:basicDeleted: sha256:a0bb86daf9adfd0ddf65312ce6c1b068100448152f2ced5d0b9b5adef5788d88...Deleted: sha256:40adf11b689fc778297c36d4b232c59fedda8c631b4271672cc86f505710502d(base) pro:Docker yves$

当然，在某些应用场景中，关于 Docker 容器及其优势还有很多可以说的。在本书中，它们提供了一种现代化的方法来部署 Python，以在完全分离的（容器化）环境中进行 Python 开发，并为算法交易提供代码发布的途径。

如果您尚未使用 Docker 容器，应考虑开始使用它们。在处理 Python 部署和开发工作时，它们提供了许多好处，不仅在本地工作时，特别是在与远程云实例和服务器部署算法交易代码时。

本节展示了如何在DigitalOcean云实例上设置完整的 Python 基础设施。市面上有许多其他云提供商，其中以Amazon Web Services（AWS）为主要提供商。然而，DigitalOcean 以其简易性和较低的小型云实例费率而闻名，其称之为Droplet。最小的 Droplet，通常足以进行探索和开发，每月只需 5 美元或每小时 0.007 美元。按小时计费，因此可以（例如）轻松地启动一个 Droplet 两小时，销毁它，并只收取 0.014 美元的费用。⁷

本节的目标是在 DigitalOcean 上设置一个 Droplet，该 Droplet 安装了 Python 3.8 并包含通常所需的包（如NumPy和pandas），并结合密码保护和安全套接层（SSL）加密的Jupyter Lab服务器安装。⁸ 作为一个基于 Web 的工具套件，Jupyter Lab提供了几个可以通过常规浏览器使用的工具：

Jupyter Notebook

这是最受欢迎的（如果不是最受欢迎的）基于浏览器的交互式开发环境，具有不同语言内核的选择，如 Python、R 和 Julia。

Python 控制台

这是基于IPython的控制台，其图形用户界面与标准的基于终端的实现外观和感觉不同。

终端

这是一个通过浏览器访问的系统 shell 实现，不仅允许进行所有典型的系统管理任务，还可以使用诸如Vim进行代码编辑或git进行版本控制等有用工具。

编辑器

另一个重要工具是基于浏览器的文本文件编辑器，支持许多不同的编程语言和文件类型的语法高亮显示，以及典型的文本/代码编辑功能。

文件管理器

Jupyter Lab还提供了一个功能齐全的文件管理器，可以进行典型的文件操作，如上传、下载和重命名。

在 Droplet 上安装Jupyter Lab允许通过浏览器进行 Python 开发和部署，避免通过安全外壳（SSH）访问云实例的需要。

要完成本节的目标，需要几个脚本：

服务器设置脚本

此脚本编排所有必要的步骤，例如复制其他文件到 Droplet 并在 Droplet 上运行它们。

Python 和Jupyter安装脚本

此脚本安装 Python、额外的包、Jupyter Lab并启动Jupyter Lab服务器。

Jupyter Notebook 配置文件

此文件用于配置Jupyter Lab服务器，例如关于密码保护的设置。

RSA 公钥和私钥文件

这两个文件是与Jupyter Lab服务器通信的 SSL 加密所必需的。

下一节反向操作列出这些文件，因为虽然设置脚本首先执行，但其他文件需要事先创建。

RSA 公钥和私钥

为了通过任意浏览器与Jupyter Lab服务器建立安全连接，需要包含 RSA 公钥和私钥的 SSL 证书（参见RSA 维基百科页面）。通常，人们期望这样的证书来自所谓的证书颁发机构（CA）。然而，对于本书的目的，自动生成的证书“足够好”。一个流行的工具用于生成 RSA 密钥对是OpenSSL。接下来的简短交互会话生成适用于Jupyter Lab服务器的证书（参见Jupyter Notebook 文档）：

(base) pro:cloud yves$ openssl req -x509 -nodes -days 365 -newkey rsa:2048 \> -keyout mykey.key -out mycert.pemGenerating a RSA private key.......+++++.....++++++++++writing new private key to 'mykey.key'-----You are about to be asked to enter information that will be incorporatedinto your certificate request.What you are about to enter is what is called a Distinguished Name or a DN.There are quite a few fields but you can leave some blank.For some fields there will be a default value,If you enter '.', the field will be left blank.-----Country Name (2 letter code) [AU]:DEState or Province Name (full name) [Some-State]:SaarlandLocality Name (e.g., city) []:VoelklingenOrganization Name (eg, company) [Internet Widgits Pty Ltd]:TPQ GmbHOrganizational Unit Name (e.g., section) []:Algorithmic TradingCommon Name (e.g., server FQDN or YOUR name) []:Jupyter LabEmail Address []:pyalgo@tpq.io(base) pro:cloud yves$

需要将两个文件mykey.key和mycert.pem复制到 Droplet，并在Jupyter Notebook配置文件中引用这些文件。接下来会介绍这个文件。

Jupyter Notebook 配置文件

可以根据Jupyter Notebook 文档安全部署公共Jupyter Lab服务器。其中，Jupyter Lab应该设置密码保护。为此，notebook.auth子包中有一个名为passwd()的函数可以生成密码哈希码。以下代码生成一个以jupyter为密码的密码哈希码：

In [1]: from notebook.auth import passwdIn [2]: passwd('jupyter')Out[2]: 'sha1:da3a3dfc0445:052235bb76e56450b38d27e41a85a136c3bf9cd7'In [3]: exit

此哈希码需要放置在Jupyter Notebook配置文件中，如示例 2-3 所示。配置文件假定 RSA 密钥文件已复制到 Droplet 的/root/.jupyter/文件夹中。

示例 2-3. Jupyter Notebook 配置文件

## Jupyter Notebook Configuration File## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH## SSL ENCRYPTION# replace the following file names (and files used) by your choice/filesc.NotebookApp.certfile = u'/root/.jupyter/mycert.pem'c.NotebookApp.keyfile = u'/root/.jupyter/mykey.key'# IP ADDRESS AND PORT# set ip to '*' to bind on all IP addresses of the cloud instancec.NotebookApp.ip = '0.0.0.0'# it is a good idea to set a known, fixed default port for server accessc.NotebookApp.port = 8888# PASSWORD PROTECTION# here: 'jupyter' as password# replace the hash code with the one for your passwordc.NotebookApp.password = \'sha1:da3a3dfc0445:052235bb76e56450b38d27e41a85a136c3bf9cd7'# NO BROWSER OPTION# prevent Jupyter from trying to open a browserc.NotebookApp.open_browser = False# ROOT ACCESS# allow Jupyter to run from root userc.NotebookApp.allow_root = True

下一步是确保 Python 和Jupyter Lab在 Droplet 上安装。

在云中部署Jupyter Lab会导致一些安全问题，因为它是通过 Web 浏览器访问的全功能开发环境。因此，使用Jupyter Lab服务器默认提供的安全措施至关重要，如密码保护和 SSL 加密。但这只是开始，根据在云实例上具体执行的任务，可能建议采取进一步的安全措施。

Python 和 Jupyter Lab 的安装脚本

安装 Python 和Jupyter Lab的 Bash 脚本类似于在 Docker 容器中通过 Miniconda 安装 Python 的“使用 Docker 容器”部分中提供的脚本。然而，在示例2-4 中的脚本还需要启动Jupyter Lab服务器。所有主要部分和代码行都在内联中有注释。

示例 2-4. 安装 Python 并运行`Jupyter Notebook`服务器的 Bash 脚本

#!/bin/bash## Script to Install# Linux System Tools and Basic Python Components# as well as to# Start Jupyter Lab Server## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH## GENERAL LINUXapt-get update # updates the package index cacheapt-get upgrade -y # updates packages# install system toolsapt-get install -y build-essential git # system toolsapt-get install -y screen htop vim wget # system toolsapt-get upgrade -y bash # upgrades bash if necessaryapt-get clean # cleans up the package index cache# INSTALLING MINICONDAwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \-O Miniconda.shbash Miniconda.sh -b # installs Minicondarm -rf Miniconda.sh # removes the installer# prepends the new path for current sessionexport PATH="/root/miniconda3/bin:$PATH"# prepends the new path in the shell configurationcat >> ~/.profile <<EOFexport PATH="/root/miniconda3/bin:$PATH"EOF# INSTALLING PYTHON LIBRARIESconda install -y jupyter # interactive data analytics in the browserconda install -y jupyterlab # Jupyter Lab environmentconda install -y numpy # numerical computing packageconda install -y pytables # wrapper for HDF5 binary storageconda install -y pandas # data analysis packageconda install -y scipy # scientific computations packageconda install -y matplotlib # standard plotting libraryconda install -y seaborn # statistical plotting libraryconda install -y quandl # wrapper for Quandl data APIconda install -y scikit-learn # machine learning libraryconda install -y openpyxl # package for Excel interactionconda install -y xlrd xlwt # packages for Excel interactionconda install -y pyyaml # package to manage yaml filespip install --upgrade pip # upgrading the package managerpip install q # logging and debuggingpip install plotly # interactive D3.js plotspip install cufflinks # combining plotly with pandaspip install tensorflow # deep learning librarypip install keras # deep learning librarypip install eikon # Python wrapper for the Refinitiv Eikon Data API# Python wrapper for Oanda APIpip install git+git://github.com/yhilpisch/tpqoa# COPYING FILES AND CREATING DIRECTORIESmkdir -p /root/.jupyter/customwget http://hilpisch.com/custom.cssmv custom.css /root/.jupyter/custommv /root/jupyter_notebook_config.py /root/.jupyter/mv /root/mycert.pem /root/.jupytermv /root/mykey.key /root/.jupytermkdir /root/notebookcd /root/notebook# STARTING JUPYTER LABjupyter lab &

此脚本需要复制到 Droplet，并需要由编排脚本启动，如下一小节所述。

编写 Droplet 设置的脚本

第二个设置 Droplet 的 Bash 脚本是最短的一个（见示例2-5）。它主要是将所有其他文件复制到 Droplet 中，需要 IP 地址作为参数。在最后一行，它启动install.sh bash 脚本，该脚本本身进行安装并启动Jupyter Lab服务器。

示例 2-5. 设置 Droplet 的 Bash 脚本

#!/bin/bash## Setting up a DigitalOcean Droplet# with Basic Python Stack# and Jupyter Notebook## Python for Algorithmic Trading# (c) Dr Yves J Hilpisch# The Python Quants GmbH## IP ADDRESS FROM PARAMETERMASTER_IP=$1# COPYING THE FILESscp install.sh root@${MASTER_IP}:scp mycert.pem mykey.key jupyter_notebook_config.py root@${MASTER_IP}:# EXECUTING THE INSTALLATION SCRIPTssh root@${MASTER_IP} bash /root/install.sh

现在一切都准备好尝试设置代码了。在 DigitalOcean 上，创建一个新的 Droplet，选择与以下选项类似的设置：

操作系统

Ubuntu 20.04 LTS x64（撰写本文时的最新版本）

大小

两个核心，2GB，60GB SSD（标准 Droplet）

数据中心地区

法兰克福（因为您的作者住在德国）

SSH 密钥

为了无密码登录添加一个（新的）SSH 密钥¹⁰

Droplet 名称

预先指定的名称或类似于pyalgo的内容

最后，点击Create按钮启动 Droplet 创建过程，通常需要大约一分钟。设置过程中的主要结果是 IP 地址，例如，当您选择法兰克福作为数据中心位置时，可能是 134.122.74.144。现在设置 Droplet 与接下来的步骤一样简单：

(base) pro:cloud yves$ bash setup.sh 134.122.74.144

然而，生成的过程可能需要几分钟。当Jupyter Lab服务器显示类似以下消息时，表示过程已完成：

[I 12:02:50.190 LabApp] Serving notebooks from local directory: /root/notebook[I 12:02:50.190 LabApp] Jupyter Notebook 6.1.1 is running at:[I 12:02:50.190 LabApp] https://pyalgo:8888/

在任何当前浏览器中，访问以下地址即可访问运行的Jupyter Notebook服务器（注意使用https协议）：

https://134.122.74.144:8888

添加安全例外后，Jupyter Notebook登录屏幕会提示输入密码（在我们的情况下为jupyter）。现在一切准备就绪，可以通过Jupyter Lab、基于IPython的控制台以及终端窗口或文本文件编辑器在浏览器中开始 Python 开发。其他文件管理功能，如文件上传、文件删除或文件夹创建，也是可用的。

云实例，如 DigitalOcean 的实例，和Jupyter Lab（由Jupyter Notebook服务器提供支持）是 Python 开发人员和算法交易从业者的强大组合，可以使用专业计算和存储基础设施。专业的云和数据中心提供商确保您的（虚拟）机器物理安全且高度可用。使用云实例还可以使探索和开发阶段的成本保持相当低，因为使用通常按小时计费，无需签订长期协议。

Python 不仅是本书的选择编程语言和技术平台，几乎每个领先的金融机构也是如此。然而，Python 部署可能是棘手的，有时甚至令人厌烦和焦虑。幸运的是，今天有技术可用——几乎所有这些技术都不到十年——可以帮助解决部署问题。开源软件conda不仅有助于 Python 软件包和虚拟环境管理。Docker 容器甚至进一步扩展了功能，可以轻松创建完整的文件系统和运行时环境，放置在技术上的“沙箱”或容器中。更进一步，像 DigitalOcean 这样的云提供商在几分钟内提供专业管理和安全的数据中心中的计算和存储容量，按小时计费。这与 Python 3.8 安装和安全的Jupyter Notebook/Lab服务器安装相结合，为 Python 开发和部署提供了专业环境，涉及 Python 用于算法交易项目。

对于Python 软件包管理，请参阅以下资源：

对于虚拟环境管理，请参阅以下资源：

有关Docker 容器的信息可以在 Docker 首页等地找到，以及以下位置：

Matthias, Karl, 和 Sean Kane。2018。Docker: Up and Running. 第 2 版。Sebastopol：O’Reilly。

Robbins (2016) 提供了对 Bash 脚本语言的简洁介绍和概述：

Robbins, Arnold. 2016. Bash Pocket Reference. 2nd ed. Sebastopol: O’Reilly.

如何安全运行公共 Jupyter Notebook/Lab 服务器在 Jupyter Notebook 文档中有解释。还有 JupyterHub 可用，允许管理多个用户的 Jupyter Notebook 服务器（参见 JupyterHub）。

要在您的新账户中享受 10 美元的起始余额并注册 DigitalOcean，请访问 http://bit.ly/do_sign_up。这可以支付最小 Droplet 两个月的使用费。

¹ 最近一个名为 pipenv 的项目将包管理器 pip 的功能与虚拟环境管理器 virtualenv 的功能结合在一起。请参阅 https://github.com/pypa/pipenv。

² 在 Windows 上，您也可以在 Docker 容器中运行完全相同的命令（参见 https://oreil.ly/GndRR）。直接在 Windows 上工作需要进行一些调整。例如，详细了解 Docker 使用情况，请参阅 Matthias 和 Kane (2018) 的书籍。

³ 安装元包 nomkl，比如 conda install numpy nomkl，可以避免自动安装和使用 mkl 及相关其他包。

⁴ 在官方文档中，您会找到以下解释：“Python 虚拟环境允许在特定应用程序的隔离位置安装 Python 包，而不是全局安装。”请参阅创建虚拟环境页面。

⁵ 有关 Docker 技术的全面介绍，请参阅 Matthias 和 Kane (2018)。

⁶ 有关 Bash 脚本的简洁介绍和快速概述，请参阅 Robbins (2016) 的书籍。另请参阅 GNU Bash。

⁷ 对于尚未与云提供商建立帐户的人，在 http://bit.ly/do_sign_up 上，新用户可获得 10 美元的起始信用额度用于 DigitalOcean。

⁸ 从技术上讲，Jupyter Lab 是 Jupyter Notebook 的扩展。但是，这两个表达有时会交替使用。

⁹ 使用这样的自动生成证书时，您可能需要在浏览器提示时添加安全异常。在 Mac OS 上，您甚至可能需要显式将证书注册为可信任。

¹⁰ 如果需要帮助，请访问如何在 DigitalOcean Droplets 中使用 SSH 密钥或者如何在 DigitalOcean Droplets 上使用 PuTTY 进行 SSH 密钥管理（Windows 用户）。

显而易见，数据胜过算法。没有全面的数据，你很可能得到不全面的预测。
罗布·托马斯（2016）

在算法交易中，通常需要处理四种类型的数据，如表格 3-1 所示。尽管这简化了金融数据的世界，但在技术设置中，区分历史数据与实时数据以及结构化数据与非结构化数据往往是有用的。

表 3-1。金融数据类型（示例）

	结构化	非结构化
历史	收盘价	金融新闻文章
实时	外汇买卖价格	推特帖子

本书主要关注于结构化数据（数值、表格数据），包括历史和实时类型。特别是本章专注于历史的结构化数据，例如法兰克福证券交易所上的 SAP SE 股票的收盘价。然而，此类别也包括日内数据，如在纳斯达克证券交易所上交易的苹果公司股票的 1 分钟 K 线数据。关于实时结构化数据的处理，请参阅第七章。

一个算法交易项目通常从需要根据历史金融数据（回）测的交易思想或假设开始。这是本章的背景，计划如下。 “从不同来源读取金融数据” 使用 pandas 从不同的文件和基于网络的来源读取数据。 “使用开放数据来源” 介绍了 Quandl 作为流行的开放数据源平台。 “Eikon 数据 API” 介绍了 Refinitiv Eikon 数据 API 的 Python 封装。最后， “有效存储金融数据” 简要介绍了如何使用 pandas 根据 HDF5 二进制存储格式有效存储历史的结构化数据。

本章的目标是以一种能有效实现交易思想和假设回测的数据格式来提供金融数据。三个主要主题是数据导入、数据处理和数据存储。本章及后续章节假设已安装了 Python 3.8 并已按详细说明安装了 Python 软件包，详细信息请参阅第二章。暂时还不确定此 Python 环境的基础设施是什么。有关如何使用 Python 进行高效的输入输出操作的详细信息，请参阅 Hilpisch（2018 年，第九章）。

本节大量使用pandas的功能，这是 Python 中流行的数据分析包（参见pandas首页）。pandas全面支持本章所关注的三项主要任务：读取数据、处理数据和存储数据。其优势之一是从不同类型的数据源读取数据，正如本节剩余部分所示。

数据集

在本节中，我们使用了一个相对较小的数据集，涉及到 2020 年 4 月从 Eikon 数据 API 检索的苹果公司股票价格（具有符号AAPL和 Reuters 工具代码或 RIC AAPL.O）。

由于这样的历史财务数据已存储在磁盘上的 CSV 文件中，可以使用纯 Python 来读取并打印其内容：

In [1]: fn = '../data/AAPL.csv' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [2]: with open(fn, 'r') as f: ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) for _ in range(5): ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) print(f.readline(), end='') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) Date,HIGH,CLOSE,LOW,OPEN,COUNT,VOLUME 2020-04-01,248.72,240.91,239.13,246.5,460606.0,44054638.0 2020-04-02,245.15,244.93,236.9,240.34,380294.0,41483493.0 2020-04-03,245.7,241.41,238.9741,242.8,293699.0,32470017.0 2020-04-06,263.11,262.47,249.38,250.9,486681.0,50455071.0

打开磁盘上的文件（如有必要，请调整路径和文件名）。

设置一个有五次迭代的for循环。

打印打开的 CSV 文件的前五行。

这种方法允许简单地检查数据。您可以了解到存在一个标题行，并且每行的单个数据点代表Date, OPEN, HIGH, LOW, CLOSE, COUNT和VOLUME。然而，数据尚未在内存中可供 Python 进一步使用。

使用 Python 从 CSV 文件中读取数据

要处理存储为 CSV 文件的数据，需要解析文件并将数据存储在 Python 数据结构中。Python 具有一个名为csv的内置模块，支持从 CSV 文件中读取数据。第一种方法生成一个包含文件数据的list对象：

In [3]: import csv ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [4]: csv_reader = csv.reader(open(fn, 'r')) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [5]: data = list(csv_reader) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [6]: data[:5] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[6]: [['Date', 'HIGH', 'CLOSE', 'LOW', 'OPEN', 'COUNT', 'VOLUME'], ['2020-04-01', '248.72', '240.91', '239.13', '246.5', '460606.0', '44054638.0'], ['2020-04-02', '245.15', '244.93', '236.9', '240.34', '380294.0', '41483493.0'], ['2020-04-03', '245.7', '241.41', '238.9741', '242.8', '293699.0', '32470017.0'], ['2020-04-06', '263.11', '262.47', '249.38', '250.9', '486681.0', '50455071.0']]

导入csv模块。

实例化一个csv.reader迭代器对象。

使用列表推导将 CSV 文件中的每一行作为list对象添加到结果list对象中。

打印出list对象的前五个元素。

使用csv.DictReader迭代器对象而不是标准的csv.reader对象来处理这样的嵌套list对象——比如计算平均收盘价——原则上是可行的，但并不是真正高效或直观的方法。这样做可以更轻松地管理任务。CSV 文件中的每一行数据（除了标题行）都作为dict对象导入，以便可以通过相应的键访问单个值：

In [7]: csv_reader = csv.DictReader(open(fn, 'r')) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [8]: data = list(csv_reader)In [9]: data[:3]Out[9]: [{'Date': '2020-04-01', 'HIGH': '248.72', 'CLOSE': '240.91', 'LOW': '239.13', 'OPEN': '246.5', 'COUNT': '460606.0', 'VOLUME': '44054638.0'}, {'Date': '2020-04-02', 'HIGH': '245.15', 'CLOSE': '244.93', 'LOW': '236.9', 'OPEN': '240.34', 'COUNT': '380294.0', 'VOLUME': '41483493.0'}, {'Date': '2020-04-03', 'HIGH': '245.7', 'CLOSE': '241.41', 'LOW': '238.9741', 'OPEN': '242.8', 'COUNT': '293699.0', 'VOLUME': '32470017.0'}]

在此，实例化了csv.DictReader迭代器对象，根据标题行中的信息将每一行数据读入dict对象。

基于单个dict对象，现在聚合操作变得更加容易。但是，当查看相应的 Python 代码时，仍然不能说计算苹果收盘股票价格的平均值是一种便捷的方式：

In [10]: sum([float(l['CLOSE']) for l in data]) / len(data) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[10]: 272.38619047619045

首先，通过列表推导生成一个包含所有收盘价的列表对象；其次，计算所有这些值的总和；第三，将得到的总和除以收盘价的数量。

这是pandas在 Python 社区中如此受欢迎的主要原因之一。它使得导入数据和处理例如金融时间序列数据集等操作比纯 Python 更加便捷（通常也更快）。

使用 pandas 从 CSV 文件读取数据

从此处开始，本节使用pandas处理苹果股票价格数据集。主要使用的函数是read_csv()，可以通过不同的参数进行多种自定义设置（参见read_csv() API 参考）。read_csv()读取数据并生成一个DataFrame对象，这是使用pandas存储（表格）数据的主要方式。DataFrame类有许多强大的方法，特别适用于金融应用（参见DataFrame API 参考）：

In [11]: import pandas as pd ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [12]: data = pd.read_csv(fn, index_col=0, parse_dates=True) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [13]: data.info() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 21 entries, 2020-04-01 to 2020-04-30 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HIGH 21 non-null float64 1 CLOSE 21 non-null float64 2 LOW 21 non-null float64 3 OPEN 21 non-null float64 4 COUNT 21 non-null float64 5 VOLUME 21 non-null float64 dtypes: float64(6) memory usage: 1.1 KBIn [14]: data.tail() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[14]: HIGH CLOSE LOW OPEN COUNT VOLUME Date 2020-04-24 283.01 282.97 277.00 277.20 306176.0 31627183.0 2020-04-27 284.54 283.17 279.95 281.80 300771.0 29271893.0 2020-04-28 285.83 278.58 278.20 285.08 285384.0 28001187.0 2020-04-29 289.67 287.73 283.89 284.73 324890.0 34320204.0 2020-04-30 294.53 293.80 288.35 289.96 471129.0 45765968.0

导入pandas包。

此代码从 CSV 文件中导入数据，指示第一列将被视为索引列，并让该列中的条目被解释为日期时间信息。

此方法调用打印出关于结果DataFrame对象的元信息。

data.tail()方法默认打印出最近的五行数据。

现在计算苹果股票收盘价的平均值只需要一个方法调用：

In [15]: data['CLOSE'].mean()Out[15]: 272.38619047619056

第四章介绍了更多关于pandas处理金融数据的功能。有关使用pandas和强大的DataFrame类的详细信息，还请参阅官方pandas文档页面和 McKinney (2017)。

尽管 Python 标准库提供了从 CSV 文件读取数据的能力，但总体而言，pandas显著简化和加速了此类操作。另一个好处是，由于read_csv()返回一个DataFrame对象，因此pandas的数据分析功能可以立即使用。

导出到 Excel 和 JSON

pandas 在需要将存储在 DataFrame 对象中的数据以非特定于 Python 的格式共享时也表现出色。除了能够导出到 CSV 文件外，pandas 还允许将导出为 Excel 电子表格文件和 JSON 文件，这两种格式在金融行业中都很流行。这种导出过程通常仅需调用一个方法：

In [16]: data.to_excel('data/aapl.xls', 'AAPL') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [17]: data.to_json('data/aapl.json') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [18]: ls -n data/ total 24 -rw-r--r-- 1 501 20 3067 Aug 25 11:47 aapl.json -rw-r--r-- 1 501 20 5632 Aug 25 11:47 aapl.xls

将数据导出到磁盘上的 Excel 电子表格文件。

将数据导出到磁盘上的 JSON 文件。

特别是在处理与 Excel 电子表格文件的交互时，有比仅仅将数据转储到新文件更加优雅的方式。xlwings，例如，是一个强大的 Python 包，允许 Python 与 Excel 之间进行高效智能的交互（访问xlwings官网）。

从 Excel 和 JSON 文件读取数据。

现在数据也以 Excel 电子表格文件和 JSON 数据文件的形式可用，pandas 可以从这些来源读取数据。这种方法与 CSV 文件一样简单：

In [19]: data_copy_1 = pd.read_excel('data/aapl.xls', 'AAPL', index_col=0) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [20]: data_copy_1.head() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[20]: HIGH CLOSE LOW OPEN COUNT VOLUME Date 2020-04-01 248.72 240.91 239.1300 246.50 460606 44054638 2020-04-02 245.15 244.93 236.9000 240.34 380294 41483493 2020-04-03 245.70 241.41 238.9741 242.80 293699 32470017 2020-04-06 263.11 262.47 249.3800 250.90 486681 50455071 2020-04-07 271.70 259.43 259.0000 270.80 467375 50721831In [21]: data_copy_2 = pd.read_json('data/aapl.json') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [22]: data_copy_2.head() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[22]: HIGH CLOSE LOW OPEN COUNT VOLUME 2020-04-01 248.72 240.91 239.1300 246.50 460606 44054638 2020-04-02 245.15 244.93 236.9000 240.34 380294 41483493 2020-04-03 245.70 241.41 238.9741 242.80 293699 32470017 2020-04-06 263.11 262.47 249.3800 250.90 486681 50455071 2020-04-07 271.70 259.43 259.0000 270.80 467375 50721831In [23]: !rm data/*

这将从 Excel 电子表格文件中读取数据到一个新的 DataFrame 对象中。

打印第一个内存副本的前五行数据。

这将从 JSON 文件读取数据到另一个 DataFrame 对象中。

这将打印出第二个内存副本的前五行数据。

pandas 在从不同类型的数据文件读取和写入金融数据方面非常有用。由于非标准的存储格式（例如分隔符使用“;”而不是“,”），读取可能会有些棘手，但 pandas 通常提供了正确的参数组合来处理这些情况。尽管本节中的所有示例仅使用了一个小数据集，但在数据集更大的重要场景中，可以期待 pandas 提供高性能的输入输出操作。

Python 生态系统的吸引力在很大程度上源于几乎所有可用的包都是开源的，并且可以免费使用。然而，金融分析特别是算法交易不能仅仅依赖于开源软件和算法；数据也扮演着至关重要的角色，正如本章开头的引言所强调的那样。前一节使用了商业数据源的小数据集。虽然多年来一直有一些有用的开放（金融）数据源可用（例如 Yahoo! Finance 或 Google Finance 提供的数据），但到 2020 年撰写本文时，这类数据源并不多。这种趋势的一个显而易见的原因可能是数据许可协议条款的不断变化。

这本书的一个显著例外是Quandl，这是一个汇总大量开放及高级（即需付费）数据源的平台。数据通过统一的 API 提供，可使用 Python 包进行封装。

Quandl 数据 API 的 Python 包（参见Quandl 上的 Python 包页面和该包的GitHub 页面）可通过 conda 安装，命令为 conda install quandl。第一个示例展示了如何获取比特币/美元汇率自加密货币比特币引入以来的历史平均价格。在 Quandl 中，请求始终需要指定数据库和具体的数据集（例如，BCHAIN 和 MKPRU）。这类信息通常可以在 Quandl 平台上查找。例如，在 Quandl 上与之相关的页面是BCHAIN/MKPRU。

默认情况下，quandl 包返回一个 pandas 的 DataFrame 对象。在示例中，Value 列也以年度化方式呈现（即年末值）。请注意，2020 年显示的数值是数据集中最后可用的数值（来自 2020 年 5 月），并非必然是年末值。

尽管 Quandl 平台上的大部分数据集都是免费的，但一些免费数据集需要 API 密钥。在免费 API 调用达到一定限额后也需要此密钥。每位用户可以通过在Quandl 注册页面注册获取此密钥。需要 API 密钥的数据请求需要在参数 api_key 中提供该密钥。在示例中，API 密钥（可在账户设置页面找到）存储为字符串，存放在变量 quandl_api_key 中。具体密钥值通过 configparser 模块从配置文件中读取：

In [24]: import configparser config = configparser.ConfigParser() config.read('../pyalgo.cfg')Out[24]: ['../pyalgo.cfg']In [25]: import quandl as q ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [26]: data = q.get('BCHAIN/MKPRU', api_key=config['quandl']['api_key']) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [27]: data.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4254 entries, 2009-01-03 to 2020-08-26 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Value 4254 non-null float64 dtypes: float64(1) memory usage: 66.5 KBIn [28]: data['Value'].resample('A').last() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[28]: Date 2009-12-31 0.000000 2010-12-31 0.299999 2011-12-31 4.995000 2012-12-31 13.590000 2013-12-31 731.000000 2014-12-31 317.400000 2015-12-31 428.000000 2016-12-31 952.150000 2017-12-31 13215.574000 2018-12-31 3832.921667 2019-12-31 7385.360000 2020-12-31 11763.930000 Freq: A-DEC, Name: Value, dtype: float64

导入 Quandl 的 Python 包。

读取比特币/美元汇率的历史数据。

选择Value列，将其重新采样—从最初的日常值到年度值—并定义最后可用的观察结果是相关的观察结果。

Quandl 还提供了例如单一股票的多样化数据集，如每日收盘股价、股票基本面或与某种股票交易的期权相关的数据集：

In [29]: data = q.get('FSE/SAP_X', start_date='2018-1-1', end_date='2020-05-01', api_key=config['quandl']['api_key'])In [30]: data.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 579 entries, 2018-01-02 to 2020-04-30 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 257 non-null float64 1 High 579 non-null float64 2 Low 579 non-null float64 3 Close 579 non-null float64 4 Change 0 non-null object 5 Traded Volume 533 non-null float64 6 Turnover 533 non-null float64 7 Last Price of the Day 0 non-null object 8 Daily Traded Units 0 non-null object 9 Daily Turnover 0 non-null object dtypes: float64(6), object(4) memory usage: 49.8+ KB

API 密钥也可以通过以下 Python 包永久配置。

q.ApiConfig.api_key = 'YOUR_API_KEY'

Quandl 平台还提供了需要订阅或付费的高级数据集。其中大多数数据集提供免费样本。例如，此示例检索了 Microsoft Corp.股票的期权隐含波动率。免费样本数据集非常庞大，有超过 4,100 行和多列（仅显示了部分子集）。代码的最后几行显示了最近五天可用的 30 天、60 天和 90 天期权隐含波动率值：

In [31]: q.ApiConfig.api_key = config['quandl']['api_key']In [32]: vol = q.get('VOL/MSFT')In [33]: vol.iloc[:, :10].info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1006 entries, 2015-01-02 to 2018-12-31 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Hv10 1006 non-null float64 1 Hv20 1006 non-null float64 2 Hv30 1006 non-null float64 3 Hv60 1006 non-null float64 4 Hv90 1006 non-null float64 5 Hv120 1006 non-null float64 6 Hv150 1006 non-null float64 7 Hv180 1006 non-null float64 8 Phv10 1006 non-null float64 9 Phv20 1006 non-null float64 dtypes: float64(10) memory usage: 86.5 KBIn [34]: vol[['IvMean30', 'IvMean60', 'IvMean90']].tail()Out[34]: IvMean30 IvMean60 IvMean90 Date 2018-12-24 0.4310 0.4112 0.3829 2018-12-26 0.4059 0.3844 0.3587 2018-12-27 0.3918 0.3879 0.3618 2018-12-28 0.3940 0.3736 0.3482 2018-12-31 0.3760 0.3519 0.3310

这就是 Python 包quandl用于 Quandl 数据 API 的概述。Quandl 平台和服务正在迅速增长，并且在算法交易环境中证明是一个宝贵的金融数据来源。

开源软件是多年前开始的趋势。它降低了许多领域的准入门槛，也包括算法交易。在这方面的一个新的增强趋势是开放数据源。在某些情况下，例如 Quandl，它们甚至提供高质量的数据集。不能指望开放数据很快完全取代专业数据订阅，但它们代表了一种成本效益高的开始进行算法交易的宝贵手段。

开放数据源对于希望进入该领域并希望能够基于真实金融数据集快速测试假设和想法的算法交易者来说是一种福音。然而，迟早会有一天，开放数据集将不再足以满足更有雄心的交易者和专业人士的需求。

瑞弗官网是全球最大的金融数据和新闻提供商之一。其当前桌面旗舰产品是Eikon，相当于彭博终端的终端，这是数据服务领域的主要竞争对手。图3-1 显示了浏览器版 Eikon 的屏幕截图。Eikon 通过单一访问点提供对 PB 级数据的访问。

图3-1. Eikon 终端的浏览器版本

最近，Refinitiv 简化了他们的 API 环境，并发布了一个名为eikon的 Python 封装包，用于 Eikon 数据 API，可通过pip install eikon安装。如果您订阅了 Refinitiv Eikon 数据服务，可以使用这个 Python 包从统一的 API 中以编程方式检索历史数据，以及流式传输结构化和非结构化数据。技术先决条件是运行一个本地桌面应用程序，提供桌面 API 会话。在撰写本文时，最新的此类桌面应用程序称为 Workspace（见图3-2）。

如果您是 Eikon 订阅者，并在开发者社区页面上有一个帐户，您将在快速入门下找到 Python Eikon 脚本库的概述。

图 3-2. 带有桌面 API 服务的工作空间应用程序

若要使用 Eikon 数据 API，需要设置 Eikon app_key。您可以通过 Eikon 或 Workspace 中的 App Key Generator (APPKEY) 应用程序获取它：

In [35]: import eikon as ek ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [36]: ek.set_app_key(config['eikon']['app_key']) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [37]: help(ek) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) Help on package eikon: NAME eikon - # coding: utf-8 PACKAGE CONTENTS Profile data_grid eikonError json_requests news_request streaming_session (package) symbology time_series tools SUBMODULES cache desktop_session istream_callback itemstream session stream stream_connection streamingprice streamingprice_callback streamingprices VERSION 1.1.5 FILE /Users/yves/Python/envs/py38/lib/python3.8/site-packages/eikon/__init__ .py

将eikon包导入为ek。

设置app_key。

显示主模块的帮助文本。

检索历史结构化数据

检索历史金融时间序列数据与之前使用的其他封装包一样简单：

In [39]: symbols = ['AAPL.O', 'MSFT.O', 'GOOG.O'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [40]: data = ek.get_timeseries(symbols, ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) start_date='2020-01-01', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) end_date='2020-05-01', ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) interval='daily', ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) fields=['*']) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [41]: data.keys() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[41]: MultiIndex([('AAPL.O', 'HIGH'), ('AAPL.O', 'CLOSE'), ('AAPL.O', 'LOW'), ('AAPL.O', 'OPEN'), ('AAPL.O', 'COUNT'), ('AAPL.O', 'VOLUME'), ('MSFT.O', 'HIGH'), ('MSFT.O', 'CLOSE'), ('MSFT.O', 'LOW'), ('MSFT.O', 'OPEN'), ('MSFT.O', 'COUNT'), ('MSFT.O', 'VOLUME'), ('GOOG.O', 'HIGH'), ('GOOG.O', 'CLOSE'), ('GOOG.O', 'LOW'), ('GOOG.O', 'OPEN'), ('GOOG.O', 'COUNT'), ('GOOG.O', 'VOLUME')], )In [42]: type(data['AAPL.O']) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)Out[42]: pandas.core.frame.DataFrameIn [43]: data['AAPL.O'].info() ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 84 entries, 2020-01-02 to 2020-05-01 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HIGH 84 non-null float64 1 CLOSE 84 non-null float64 2 LOW 84 non-null float64 3 OPEN 84 non-null float64 4 COUNT 84 non-null Int64 5 VOLUME 84 non-null Int64 dtypes: Int64(2), float64(4) memory usage: 4.8 KBIn [44]: data['AAPL.O'].tail() ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)Out[44]: HIGH CLOSE LOW OPEN COUNT VOLUME Date 2020-04-27 284.54 283.17 279.95 281.80 300771 29271893 2020-04-28 285.83 278.58 278.20 285.08 285384 28001187 2020-04-29 289.67 287.73 283.89 284.73 324890 34320204 2020-04-30 294.53 293.80 288.35 289.96 471129 45765968 2020-05-01 299.00 289.07 285.85 286.25 558319 60154175

将几个符号定义为list对象。

检索第一个符号的数据的核心代码行…

…给定的开始日期和…

…给定的结束日期。

此处选择的时间间隔为daily。

请求所有字段。

函数get_timeseries()返回一个多索引DataFrame对象。

与每个级别对应的值是常规DataFrame对象。

这提供了存储在DataFrame对象中的数据概述。

显示最后五行数据。

当希望使用多个符号特别是不同的财务数据粒度（即其他时间间隔）时，与专业数据服务 API 合作的优势变得显而易见：

In [45]: %%time data = ek.get_timeseries(symbols, ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) start_date='2020-08-14', ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) end_date='2020-08-15', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) interval='minute', ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) fields='*') CPU times: user 58.2 ms, sys: 3.16 ms, total: 61.4 ms Wall time: 2.02 sIn [46]: print(data['GOOG.O'].loc['2020-08-14 16:00:00': '2020-08-14 16:04:00']) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) HIGH LOW OPEN CLOSE COUNT VOLUME Date 2020-08-14 16:00:00 1510.7439 1509.220 1509.940 1510.5239 48 1362 2020-08-14 16:01:00 1511.2900 1509.980 1510.500 1511.2900 52 1002 2020-08-14 16:02:00 1513.0000 1510.964 1510.964 1512.8600 72 1762 2020-08-14 16:03:00 1513.6499 1512.160 1512.990 1513.2300 108 4534 2020-08-14 16:04:00 1513.6500 1511.540 1513.418 1512.7100 40 1364In [47]: for sym in symbols: print('\n' + sym + '\n', data[sym].iloc[-300:-295]) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) AAPL.O HIGH LOW OPEN CLOSE COUNT VOLUME Date 2020-08-14 19:01:00 457.1699 456.6300 457.14 456.83 1457 104693 2020-08-14 19:02:00 456.9399 456.4255 456.81 456.45 1178 79740 2020-08-14 19:03:00 456.8199 456.4402 456.45 456.67 908 68517 2020-08-14 19:04:00 456.9800 456.6100 456.67 456.97 665 53649 2020-08-14 19:05:00 457.1900 456.9300 456.98 457.00 679 49636 MSFT.O HIGH LOW OPEN CLOSE COUNT VOLUME Date 2020-08-14 19:01:00 208.6300 208.5083 208.5500 208.5674 333 21368 2020-08-14 19:02:00 208.5750 208.3550 208.5501 208.3600 513 37270 2020-08-14 19:03:00 208.4923 208.3000 208.3600 208.4000 303 23903 2020-08-14 19:04:00 208.4200 208.3301 208.3901 208.4099 222 15861 2020-08-14 19:05:00 208.4699 208.3600 208.3920 208.4069 235 9569 GOOG.O HIGH LOW OPEN CLOSE COUNT VOLUME Date 2020-08-14 19:01:00 1510.42 1509.3288 1509.5100 1509.8550 47 1577 2020-08-14 19:02:00 1510.30 1508.8000 1509.7559 1508.8647 71 2950 2020-08-14 19:03:00 1510.21 1508.7200 1508.7200 1509.8100 33 603 2020-08-14 19:04:00 1510.21 1508.7200 1509.8800 1509.8299 41 934 2020-08-14 19:05:00 1510.21 1508.7300 1509.5500 1509.6600 30 445

数据同时检索所有符号。

时间间隔…

…显著缩短。

函数调用检索了符号的分钟间隔条形图。

打印了来自 Google, LLC 数据集的五行。

打印了每个DataFrame对象的三行数据。

上述代码说明了通过 Python 从 Eikon API 检索历史金融时间序列数据是多么方便。默认情况下，函数get_timeseries()为interval参数提供以下选项：tick、minute、hour、daily、weekly、monthly、quarterly和yearly。这在算法交易环境中提供了所有所需的灵活性，特别是与pandas的重新采样能力结合使用，如下所示的代码：

In [48]: %%time data = ek.get_timeseries(symbols[0], start_date='2020-08-14 15:00:00', ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) end_date='2020-08-14 15:30:00', ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) interval='tick', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) fields=['*']) CPU times: user 257 ms, sys: 17.3 ms, total: 274 ms Wall time: 2.31 sIn [49]: data.info() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 47346 entries, 2020-08-14 15:00:00.019000 to 2020-08-14 15:29:59.987000 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 VALUE 47311 non-null float64 1 VOLUME 47346 non-null Int64 dtypes: Int64(1), float64(1) memory usage: 1.1 MBIn [50]: data.head() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[50]: VALUE VOLUME Date 2020-08-14 15:00:00.019 453.2499 60 2020-08-14 15:00:00.036 453.2294 3 2020-08-14 15:00:00.146 453.2100 5 2020-08-14 15:00:00.146 453.2100 100 2020-08-14 15:00:00.236 453.2100 2In [51]: resampled = data.resample('30s', label='right').agg( {'VALUE': 'last', 'VOLUME': 'sum'}) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [52]: resampled.tail() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[52]: VALUE VOLUME Date 2020-08-14 15:28:00 453.9000 29746 2020-08-14 15:28:30 454.2869 86441 2020-08-14 15:29:00 454.3900 49513 2020-08-14 15:29:30 454.7550 98520 2020-08-14 15:30:00 454.6200 55592

时间间隔为…

…由于数据检索限制，选择了一个小时。

参数interval设定为tick。

获取了接近 50,000 个价格 tick 的间隔。

时间序列数据集显示了两个 tick 之间高度不规则（异质）的间隔长度。

通过采取最后一个值和总和的方式，重新采样 tick 数据为 30 秒间隔长度…

…反映在新DataFrame对象的DatetimeIndex中。

检索历史非结构化数据

通过 Python 通过 Eikon API 处理非结构化数据的主要优势是轻松检索，然后可以使用 Python 的自然语言处理（NLP）包解析和分析。这样的过程与金融时间序列数据一样简单和直观。

随后的代码检索了包括苹果公司和“Macbook”作为关键词的固定时间间隔内的新闻标题。显示最近的五个点击数：

In [53]: headlines = ek.get_news_headlines(query='R:AAPL.O macbook', ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) count=5, ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) date_from='2020-4-1', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) date_to='2020-5-1') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [54]: headlines ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[54]: versionCreated \ 2020-04-20 21:33:37.332 2020-04-20 21:33:37.332000+00:00 2020-04-20 10:20:23.201 2020-04-20 10:20:23.201000+00:00 2020-04-20 02:32:27.721 2020-04-20 02:32:27.721000+00:00 2020-04-15 12:06:58.693 2020-04-15 12:06:58.693000+00:00 2020-04-09 21:34:08.671 2020-04-09 21:34:08.671000+00:00 text \ 2020-04-20 21:33:37.332 Apple said to launch new AirPods, MacBook Pro ... 2020-04-20 10:20:23.201 Apple might launch upgraded AirPods, 13-inch M... 2020-04-20 02:32:27.721 Apple to reportedly launch new AirPods alongsi... 2020-04-15 12:06:58.693 Apple files a patent for iPhones, MacBook indu... 2020-04-09 21:34:08.671 Apple rolls out new software update for MacBoo... storyId \ 2020-04-20 21:33:37.332 urn:newsml:reuters.com:20200420:nNRAble9rq:1 2020-04-20 10:20:23.201 urn:newsml:reuters.com:20200420:nNRAbl8eob:1 2020-04-20 02:32:27.721 urn:newsml:reuters.com:20200420:nNRAbl4mfz:1 2020-04-15 12:06:58.693 urn:newsml:reuters.com:20200415:nNRAbjvsix:1 2020-04-09 21:34:08.671 urn:newsml:reuters.com:20200409:nNRAbi2nbb:1 sourceCode 2020-04-20 21:33:37.332 NS:TIMIND 2020-04-20 10:20:23.201 NS:BUSSTA 2020-04-20 02:32:27.721 NS:HINDUT 2020-04-15 12:06:58.693 NS:HINDUT 2020-04-09 21:34:08.671 NS:TIMINDIn [55]: story = headlines.iloc[0] ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [56]: story ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[56]: versionCreated 2020-04-20 21:33:37.332000+00:00 text Apple said to launch new AirPods, MacBook Pro ... storyId urn:newsml:reuters.com:20200420:nNRAble9rq:1 sourceCode NS:TIMIND Name: 2020-04-20 21:33:37.332000, dtype: objectIn [57]: news_text = ek.get_news_story(story['storyId']) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [58]: from IPython.display import HTML ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [59]: HTML(news_text) ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)Out[59]: <IPython.core.display.HTML object>

NEW DELHI: Apple recently launched its much-awaited affordable smartphoneiPhone SE. Now it seems that the company is gearing up for another launch.Apple is said to launch the next generation of AirPods and the all-new13-inch MacBook Pro next month.In February an online report revealed that the Cupertino-based tech giantis working on AirPods Pro Lite. Now a tweet by tipster Job Posser hasrevealed that Apple will soon come up with new AirPods and MacBook Pro.Jon Posser tweeted, "New AirPods (which were supposed to be at theMarch Event) is now ready to go.Probably alongside the MacBook Pro next month." However, not many detailsabout the upcoming products are available right now. The company wassupposed to launch these products at the March event along with the iPhone SE.But due to the ongoing pandemic coronavirus, the event got cancelled.It is expected that Apple will launch the AirPods Pro Lite and the 13-inchMacBook Pro just like the way it launched the iPhone SE. Meanwhile,Apple has scheduled its annual developer conference WWDC to take place in June.This year the company has decided to hold an online-only event due tothe outbreak of coronavirus. Reports suggest that this year the companyis planning to launch the all-new AirTags and a premium pair of over-earBluetooth headphones at the event. Using the Apple AirTags, users willbe able to locate real-world items such as keys or suitcase in the Find My app.The AirTags will also have offline finding capabilities that the companyintroduced in the core of iOS 13\. Apart from this, Apple is also said tounveil its high-end Bluetooth headphones. It is expected that the Bluetoothheadphones will offer better sound quality and battery backup as comparedto the AirPods.For Reprint Rights: timescontent.comCopyright (c) 2020 BENNETT, COLEMAN & CO.LTD.

用于检索操作的query参数。

将最大点击数设置为五。

定义了间隔…

…用于查找新闻标题。

给出了结果对象（输出已缩短）。

挑选了一个特定的标题…

…并显示 story_id。

这将作为 HTML 代码检索新闻文本。

例如，在 Jupyter Notebook 中，HTML 代码…

…可渲染以提高阅读体验。

这结束了对 Refinitiv Eikon 数据 API 的 Python 封装包的说明。

在算法交易中，数据集管理中最重要的一个场景是“获取一次，多次使用”。或者从输入输出（IO）的角度来看，是“写入一次，多次读取”。在第一种情况下，数据可能从 web 服务中检索，然后基于数据集的临时内存副本多次进行策略回测。在第二种情况下，连续接收的 tick 数据被写入磁盘，并在后续与回测过程中的某些操作（如聚合）结合使用多次。

本节假设内存中存储数据的数据结构是 pandas 的 DataFrame 对象，无论数据来自何处（从 CSV 文件、web 服务等）。

为了有一个在大小上有意义的数据集可用，本节使用了由伪随机数生成的样本金融数据集。“Python 脚本” 提供了一个名为 generate_sample_data() 的函数，完成了任务。

原则上，此函数以表格形式生成一个任意大小的样本金融数据集（当然，可用内存会设置限制）：

In [60]: from sample_data import generate_sample_data ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [61]: print(generate_sample_data(rows=5, cols=4)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) No0 No1 No2 No3 2021-01-01 00:00:00 100.000000 100.000000 100.000000 100.000000 2021-01-01 00:01:00 100.019641 99.950661 100.052993 99.913841 2021-01-01 00:02:00 99.998164 99.796667 100.109971 99.955398 2021-01-01 00:03:00 100.051537 99.660550 100.136336 100.024150 2021-01-01 00:04:00 99.984614 99.729158 100.210888 99.976584

从 Python 脚本导入函数。

打印一个包含五行四列的样本金融数据集。

存储 DataFrame 对象

使用 pandas 的 HDFStore 封装功能可以简化整个 DataFrame 对象的存储，支持 HDF5 二进制存储标准。它允许一次性将完整的 DataFrame 对象导出到基于文件的数据库对象。为了说明其实现，第一步是创建一个具有意义的样本数据集。在这里，生成的 DataFrame 大约为 420 MB：

In [62]: %time data = generate_sample_data(rows=5e6, cols=10).round(4) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 3.88 s, sys: 830 ms, total: 4.71 s Wall time: 4.72 sIn [63]: data.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 5000000 entries, 2021-01-01 00:00:00 to 2030-07-05 05:19:00 Freq: T Data columns (total 10 columns): # Column Dtype --- ------ ----- 0 No0 float64 1 No1 float64 2 No2 float64 3 No3 float64 4 No4 float64 5 No5 float64 6 No6 float64 7 No7 float64 8 No8 float64 9 No9 float64 dtypes: float64(10) memory usage: 419.6 MB

生成一个包含 5,000,000 行和十列的样本金融数据集；生成过程需要几秒钟时间。

第二步是在磁盘上打开 HDFStore 对象（即 HDF5 数据库文件）并将 DataFrame 对象写入其中。¹ 磁盘上的大小约为 440 MB，比内存中的 DataFrame 对象稍大。然而，写入速度比内存生成示例数据集快约五倍。

在 Python 中使用像 HDF5 数据库文件这样的二进制存储通常可以获得接近硬件可用的理论最大写入速度：²

In [64]: h5 = pd.HDFStore('data/data.h5', 'w') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [65]: %time h5['data'] = data ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) CPU times: user 356 ms, sys: 472 ms, total: 828 ms Wall time: 1.08 sIn [66]: h5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[66]: <class 'pandas.io.pytables.HDFStore'> File path: data/data.h5In [67]: ls -n data/data.* -rw-r--r--@ 1 501 20 440007240 Aug 25 11:48 data/data.h5In [68]: h5.close() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

这会打开磁盘上的数据库文件进行写入（并覆盖可能存在的同名文件）。

将 DataFrame 对象写入磁盘只需不到一秒钟。

这会打印出数据库文件的元信息。

这会关闭数据库文件。

第三步是从基于文件的 HDFStore 对象中读取数据。读取速度通常接近理论上的最大速度：

In [69]: h5 = pd.HDFStore('data/data.h5', 'r') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [70]: %time data_copy = h5['data'] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) CPU times: user 388 ms, sys: 425 ms, total: 813 ms Wall time: 812 msIn [71]: data_copy.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 5000000 entries, 2021-01-01 00:00:00 to 2030-07-05 05:19:00 Freq: T Data columns (total 10 columns): # Column Dtype --- ------ ----- 0 No0 float64 1 No1 float64 2 No2 float64 3 No3 float64 4 No4 float64 5 No5 float64 6 No6 float64 7 No7 float64 8 No8 float64 9 No9 float64 dtypes: float64(10) memory usage: 419.6 MBIn [72]: h5.close()In [73]: rm data/data.h5

打开数据库文件进行读取。

读取时间少于半秒钟。

还有另一种更灵活的方式将 DataFrame 对象中的数据写入 HDFStore 对象中。为此，可以使用 DataFrame 对象的 to_hdf() 方法，并将 format 参数设置为 table（参见 to_hdf API 参考页面）。这允许在磁盘上的 table 对象上追加新数据，并且还可以进行数据搜索，这是第一种方法所不具备的。但代价是较慢的写入和读取速度：

In [74]: %time data.to_hdf('data/data.h5', 'data', format='table') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 3.25 s, sys: 491 ms, total: 3.74 s Wall time: 3.8 sIn [75]: ls -n data/data.* -rw-r--r--@ 1 501 20 446911563 Aug 25 11:48 data/data.h5In [76]: %time data_copy = pd.read_hdf('data/data.h5', 'data') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) CPU times: user 236 ms, sys: 266 ms, total: 502 ms Wall time: 503 msIn [77]: data_copy.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 5000000 entries, 2021-01-01 00:00:00 to 2030-07-05 05:19:00 Freq: T Data columns (total 10 columns): # Column Dtype --- ------ ----- 0 No0 float64 1 No1 float64 2 No2 float64 3 No3 float64 4 No4 float64 5 No5 float64 6 No6 float64 7 No7 float64 8 No8 float64 9 No9 float64 dtypes: float64(10) memory usage: 419.6 MB

这定义了写入格式为 table 类型。由于这种格式类型涉及更多开销，并导致稍微增加的文件大小，因此写入速度变慢。

在这种应用场景中，读取速度也较慢。

实际上，这种方法的优势在于可以像处理 pandas 中使用的 PyTables 包的任何其他 table 对象一样处理磁盘上的 table_frame 对象。这提供了对 PyTables 包的某些基本功能的访问，例如追加行到 table 对象：

In [78]: import tables as tb ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [79]: h5 = tb.open_file('data/data.h5', 'r') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [80]: h5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[80]: File(filename=data/data.h5, title='', mode='r', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)) / (RootGroup) '' /data (Group) '' /data/table (Table(5000000,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(10,), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (2978,) autoindex := True colindexes := { "index": Index(6, medium, shuffle, zlib(1)).is_csi=False}In [81]: h5.root.data.table[:3] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[81]: array([(1609459200000000000, [100. , 100. , 100. , 100. , 100. , 100. , 100. , 100. , 100. , 100. ]), (1609459260000000000, [100.0752, 100.1164, 100.0224, 100.0073, 100.1142, 100.0474, 99.9329, 100.0254, 100.1009, 100.066 ]), (1609459320000000000, [100.1593, 100.1721, 100.0519, 100.0933, 100.1578, 100.0301, 99.92 , 100.0965, 100.1441, 100.0717])], dtype=[('index', '<i8'), ('values_block_0', '<f8', (10,))])In [82]: h5.close() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [83]: rm data/data.h5

导入 PyTables 包。

打开数据库文件进行读取。

显示数据库文件的内容。

打印表中的前三行。

关闭数据库。

尽管这种第二种方法提供了更多的灵活性，但它并不打开PyTables包的全部功能。尽管如此，在处理更多或少不可变内存中适合的数据集时，此子部分介绍的两种方法是方便和高效的。然而，如今算法交易一般需要处理不断增长的数据集，例如与股票价格或外汇汇率相关的时序数据。为了应对这种情况的要求，可能需要使用替代方法。

使用HDFStore包装器进行 HDF5 二进制存储标准，pandas能够几乎以可用硬件允许的最大速度写入和读取金融数据。与 CSV 等其他基于文件的格式相比，导出通常要慢得多。

使用 TsTables

PyTables包，导入名为tables，是 HDF5 二进制存储库的包装器，也被pandas用于其在前述子部分中展示的HDFStore实现。TsTables包（参见包的 GitHub 页面）则致力于有效处理基于 HDF5 二进制存储库的大型金融时间序列数据集。它有效地增强了PyTables包，并为其能力添加了对时间序列数据的支持。它实现了一种分层存储方法，允许通过提供开始和结束日期和时间来快速检索数据子集。TsTables支持的主要场景是“一次写入，多次检索”。

此子部分展示的设置是数据不断从网络源、专业数据提供商等收集，并在中间和内存中存储在DataFrame对象中。一段时间后或检索到一定数量的数据点后，收集的数据将存储在 HDF5 数据库中的TsTables table对象中。

首先，这里是生成样本数据的过程：

In [84]: %%time data = generate_sample_data(rows=2.5e6, cols=5, freq='1s').round(4) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 915 ms, sys: 191 ms, total: 1.11 s Wall time: 1.14 sIn [85]: data.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2500000 entries, 2021-01-01 00:00:00 to 2021-01-29 22:26:39 Freq: S Data columns (total 5 columns): # Column Dtype --- ------ ----- 0 No0 float64 1 No1 float64 2 No2 float64 3 No3 float64 4 No4 float64 dtypes: float64(5) memory usage: 114.4 MB

这生成了一个包含 250 万行和五列，频率为一秒的样本金融数据集；样本数据四舍五入到两位小数。

其次，导入更多内容并创建TsTables table对象。主要部分是定义desc类，提供table对象数据结构的描述：

目前，TsTables仅适用于旧版本 0.19 的pandas。有一个友好的分支，适用于更新的pandas版本，可在http://github.com/yhilpisch/tstables上找到，并可通过以下方式安装：

pip install git+https://github.com/yhilpisch/tstables.git

In [86]: import tstables ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [87]: import tables as tb ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [88]: class desc(tb.IsDescription): ''' Description of TsTables table structure. ''' timestamp = tb.Int64Col(pos=0) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) No0 = tb.Float64Col(pos=1) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) No1 = tb.Float64Col(pos=2) No2 = tb.Float64Col(pos=3) No3 = tb.Float64Col(pos=4) No4 = tb.Float64Col(pos=5)In [89]: h5 = tb.open_file('data/data.h5ts', 'w') ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [90]: ts = h5.create_ts('/', 'data', desc) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [91]: h5 ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[91]: File(filename=data/data.h5ts, title='', mode='w', root_uep='/', filters=Filters(complevel=0, shuffle=False, bitshuffle=False, fletcher32=False, least_significant_digit=None)) / (RootGroup) '' /data (Group/Timeseries) '' /data/y2020 (Group) '' /data/y2020/m08 (Group) '' /data/y2020/m08/d25 (Group) '' /data/y2020/m08/d25/ts_data (Table(0,)) '' description := { "timestamp": Int64Col(shape=(), dflt=0, pos=0), "No0": Float64Col(shape=(), dflt=0.0, pos=1), "No1": Float64Col(shape=(), dflt=0.0, pos=2), "No2": Float64Col(shape=(), dflt=0.0, pos=3), "No3": Float64Col(shape=(), dflt=0.0, pos=4), "No4": Float64Col(shape=(), dflt=0.0, pos=5)} byteorder := 'little' chunkshape := (1365,)

TsTables（从https://github.com/yhilpisch/tstables安装）…

…导入了PyTables。

表的第一列是表示为int值的timestamp。

所有数据列都包含float值。

打开一个新的数据库文件进行写入。

TsTables表在根节点创建，命名为data，并给出基于类的描述desc。

检查数据库文件揭示了按年、月和日进行分层结构化的基本原则。

第三步是将存储在DataFrame对象中的样本数据写入到磁盘上的table对象中。TsTables的主要优势之一是通过简单的方法调用轻松完成此操作。更重要的是，在这里的便利性与速度结合。关于数据库的结构，TsTables将数据分块成一天的子集。在例子中，如果频率设置为一秒钟，这意味着每天将有 24 x 60 x 60 = 86,400 个数据行：

In [92]: %time ts.append(data) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 476 ms, sys: 238 ms, total: 714 ms Wall time: 739 msIn [93]: # h5 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)

File(filename=data/data.h5ts, title='', mode='w', root_uep='/',filters=Filters(complevel=0, shuffle=False, bitshuffle=False,fletcher32=False, least_significant_digit=None))/ (RootGroup) ''/data (Group/Timeseries) ''/data/y2020 (Group) ''/data/y2021 (Group) ''/data/y2021/m01 (Group) ''/data/y2021/m01/d01 (Group) ''/data/y2021/m01/d01/ts_data (Table(86400,)) '' description := { "timestamp": Int64Col(shape=(), dflt=0, pos=0), "No0": Float64Col(shape=(), dflt=0.0, pos=1), "No1": Float64Col(shape=(), dflt=0.0, pos=2), "No2": Float64Col(shape=(), dflt=0.0, pos=3), "No3": Float64Col(shape=(), dflt=0.0, pos=4), "No4": Float64Col(shape=(), dflt=0.0, pos=5)} byteorder := 'little' chunkshape := (1365,)/data/y2021/m01/d02 (Group) ''/data/y2021/m01/d02/ts_data (Table(86400,)) '' description := { "timestamp": Int64Col(shape=(), dflt=0, pos=0), "No0": Float64Col(shape=(), dflt=0.0, pos=1), "No1": Float64Col(shape=(), dflt=0.0, pos=2), "No2": Float64Col(shape=(), dflt=0.0, pos=3), "No3": Float64Col(shape=(), dflt=0.0, pos=4), "No4": Float64Col(shape=(), dflt=0.0, pos=5)} byteorder := 'little' chunkshape := (1365,)/data/y2021/m01/d03 (Group) ''/data/y2021/m01/d03/ts_data (Table(86400,)) '' description := { "timestamp": Int64Col(shape=(), dflt=0, pos=0),...

通过简单的方法调用将DataFrame对象追加到表中。

table对象在append()操作后每天显示 86,400 行。

从TsTables表对象中读取数据子集通常非常快，因为这是其首要优化目标。在这方面，TsTables非常支持典型的算法交易应用，如回测。另一个贡献因素是，TsTables返回的数据已经作为DataFrame对象，因此通常不需要额外的转换：

In [94]: import datetimeIn [95]: start = datetime.datetime(2021, 1, 2) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [96]: end = datetime.datetime(2021, 1, 3) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [97]: %time subset = ts.read_range(start, end) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) CPU times: user 10.3 ms, sys: 3.63 ms, total: 14 ms Wall time: 12.8 msIn [98]: start = datetime.datetime(2021, 1, 2, 12, 30, 0)In [99]: end = datetime.datetime(2021, 1, 5, 17, 15, 30)In [100]: %time subset = ts.read_range(start, end) CPU times: user 28.6 ms, sys: 18.5 ms, total: 47.1 ms Wall time: 46.1 msIn [101]: subset.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 276331 entries, 2021-01-02 12:30:00 to 2021-01-05 17:15:30 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 No0 276331 non-null float64 1 No1 276331 non-null float64 2 No2 276331 non-null float64 3 No3 276331 non-null float64 4 No4 276331 non-null float64 dtypes: float64(5) memory usage: 12.6 MBIn [102]: h5.close()In [103]: rm data/*

这定义了起始日期和…

…数据检索操作的结束日期。

read_range()方法以起始和结束日期作为输入，读取操作仅需几毫秒。

在一天内检索到的新数据可以追加到TsTables表对象中，如前所示。因此，该包在与HDFStore对象结合使用时，对于高效存储和检索（大型）金融时间序列数据集是一种宝贵的补充。

使用 SQLite3 存储数据

金融时间序列数据也可以直接从DataFrame对象写入到关系数据库（如SQLite3）。在应用 SQL 查询语言来实现更复杂的分析时，关系数据库的使用可能会很有用。然而，就速度和磁盘使用情况而言，关系数据库无法与依赖二进制存储格式（如 HDF5）的其他方法相比。

DataFrame类提供了to_sql()方法（参见to_sql() API 参考页面），用于将数据写入关系数据库中的表中。100+ MB 的磁盘大小表明在使用关系数据库时存在相当大的开销：

In [104]: %time data = generate_sample_data(1e6, 5, '1min').round(4) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 342 ms, sys: 60.5 ms, total: 402 ms Wall time: 405 msIn [105]: data.info() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1000000 entries, 2021-01-01 00:00:00 to 2022-11-26 10:39:00 Freq: T Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 No0 1000000 non-null float64 1 No1 1000000 non-null float64 2 No2 1000000 non-null float64 3 No3 1000000 non-null float64 4 No4 1000000 non-null float64 dtypes: float64(5) memory usage: 45.8 MBIn [106]: import sqlite3 as sq3 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [107]: con = sq3.connect('data/data.sql') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [108]: %time data.to_sql('data', con) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) CPU times: user 4.6 s, sys: 352 ms, total: 4.95 s Wall time: 5.07 sIn [109]: ls -n data/data.* -rw-r--r--@ 1 501 20 105316352 Aug 25 11:48 data/data.sql

示例金融数据集有 100 万行和五列；内存使用约为 46 MB。

这导入了SQLite3模块。

打开到新数据库文件的连接。

将数据写入关系数据库需要几秒钟时间。

关系数据库的一个优势是能够基于标准化的 SQL 语句实现（非内存）分析任务。例如，考虑一个查询，选择No1列中值在 105 和 108 之间的所有行：

In [110]: query = 'SELECT * FROM data WHERE No1 > 105 and No2 < 108' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [111]: %time res = con.execute(query).fetchall() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) CPU times: user 109 ms, sys: 30.3 ms, total: 139 ms Wall time: 138 msIn [112]: res[:5] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[112]: [('2021-01-03 19:19:00', 103.6894, 105.0117, 103.9025, 95.8619, 93.6062), ('2021-01-03 19:20:00', 103.6724, 105.0654, 103.9277, 95.8915, 93.5673), ('2021-01-03 19:21:00', 103.6213, 105.1132, 103.8598, 95.7606, 93.5618), ('2021-01-03 19:22:00', 103.6724, 105.1896, 103.8704, 95.7302, 93.4139), ('2021-01-03 19:23:00', 103.8115, 105.1152, 103.8342, 95.706, 93.4436)]In [113]: len(res) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[113]: 5035In [114]: con.close()In [115]: rm data/*

SQL 查询作为 Python 的str对象。

用于检索所有结果行的查询。

打印的前五个结果。

结果list对象的长度。

诚然，如果数据集适合内存，pandas也可以执行这样简单的查询。然而，SQL 查询语言已经被证明在几十年的时间里非常有用且强大，应该成为算法交易员数据武器库中的一部分。

pandas还支持通过SQLAlchemy进行数据库连接，SQLAlchemy是一个 Python 抽象层包，用于各种关系数据库（参见SQLAlchemy首页）。这反过来又允许使用，例如，MySQL作为关系数据库后端。

本章涵盖了金融时间序列数据的处理。它说明了如何从不同的基于文件的数据源（如 CSV 文件）读取此类数据。它还展示了如何从 Web 服务中检索金融数据，例如 Quandl 的端点数据和期权数据。开放的金融数据源是金融领域的宝贵补充。Quandl 是一个平台，将成千上万个开放数据集集成到统一 API 的伞下。

本章还涵盖了在磁盘上高效存储完整DataFrame对象以及在数据库中存储内存中数据的相关重要主题。本章使用的数据库类型包括 HDF5 数据库标准和轻量级关系数据库SQLite3。这章奠定了第四章的基础，该章涉及向量化回测；第五章，该章介绍市场预测的机器学习和深度学习；以及第六章，该章讨论基于事件的交易策略回测。

您可以在以下链接找到更多关于 Quandl 的信息：

http://quandl.org

有关用于从该来源检索数据的软件包信息，请查阅此处：

您应该查阅官方文档页面以获取更多关于本章中使用的软件包的信息：

本章引用的书籍和文章：

Hilpisch, Yves. 2018. 《Python 金融分析：掌握数据驱动的金融》。第 2 版。Sebastopol：O’Reilly。
McKinney, Wes. 2017. 《Python 数据分析：使用 Pandas、NumPy 和 IPython 进行数据处理》。第 2 版。Sebastopol：O’Reilly。
Thomas, Rob. “预测数据科学：坏的选举日预测给予打击：预测模型受限于狭窄的数据、错误的算法和人类缺陷。” 《华尔街日报》，2016 年 11 月 9 日。

以下 Python 脚本基于蒙特卡洛模拟生成样本金融时间序列数据，详见 Hilpisch（2018 年，第十二章）：

## Python Module to Generate a# Sample Financial Data Set## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdr = 0.05 # constant short ratesigma = 0.5 # volatility factordef generate_sample_data(rows, cols, freq='1min'): ''' Function to generate sample financial data. Parameters ========== rows: int number of rows to generate cols: int number of columns to generate freq: str frequency string for DatetimeIndex Returns ======= df: DataFrame DataFrame object with the sample data ''' rows = int(rows) cols = int(cols) # generate a DatetimeIndex object given the frequency index = pd.date_range('2021-1-1', periods=rows, freq=freq) # determine time delta in year fractions dt = (index[1] - index[0]) / pd.Timedelta(value='365D') # generate column names columns = ['No%d' % i for i in range(cols)] # generate sample paths for geometric Brownian motion raw = np.exp(np.c*msum((r - 0.5 * sigma ** 2) * dt + sigma * np.sqrt(dt) * np.random.standard_normal((rows, cols)), axis=0)) # normalize the data to start at 100 raw = raw / raw[0] * 100 # generate the DataFrame object df = pd.DataFrame(raw, index=index, columns=columns) return dfif __name__ == '__main__': rows = 5 # number of rows columns = 3 # number of columns freq = 'D' # daily frequency print(generate_sample_data(rows, columns, freq))

¹ 当然，多个DataFrame对象也可以存储在单个HDFStore对象中。

² 所有报告的数值均来自作者使用的 MacMini，配备 Intel i7 六核处理器（12 个线程），32 GB 随机存取内存（DDR4 RAM），以及 512 GB 固态驱动器（SSD）。

[T]他们愚蠢到以为可以通过过去来预测未来。¹
经济学家

开发算法交易程序的想法和假设通常是准备阶段中更具创造性甚至有时更有趣的部分。彻底测试它们通常是更技术性和耗时的部分。本章讨论不同算法交易策略的向量化回测。它涵盖以下类型的策略（还参考“交易策略”）：

简单移动平均线（SMA）基于策略

SMA 用于买卖信号生成的基本理念已有几十年历史。SMA 是所谓的股票价格技术分析中的重要工具。例如，当在较短时间窗口（如 42 天）上定义的 SMA 穿越在较长时间窗口（如 252 天）上定义的 SMA 时，就会产生信号。

动量策略

这些策略基于这样的假设：最近的表现将持续一段时间。例如，一个下跌趋势的股票被认为将继续下跌，这就是为什么应该做空这样的股票。

均值回归策略

均值回归策略背后的推理是，股票价格或其他金融工具的价格在偏离这些水平太多时，倾向于回归到某种均值水平或趋势水平。

本章内容如下。“利用向量化”介绍了向量化作为一个有用的技术方法来制定和回测交易策略。“基于简单移动平均线的策略”是本章的核心内容，深入讨论了基于 SMA 的向量化回测策略。“基于动量的策略”介绍并回测了基于所谓的时间序列动量（“近期表现”）的股票交易策略。“基于均值回归的策略”结束了本章的内容，涵盖了均值回归策略。最后，“数据窥探和过度拟合”讨论了在算法交易策略回测中数据窥探和过度拟合的陷阱。

本章的主要目标是掌握向量化实现方法，这些方法如NumPy和pandas包所允许的，作为高效快速的回测工具。为此，所提出的方法在简化假设的基础上更好地集中讨论向量化的主题。

在以下情况下应考虑向量化回测：

简单交易策略

当涉及建模算法交易策略时，向量化回测方法显然存在局限性。然而，许多流行的简单策略可以以向量化方式进行回测。

互动式策略探索

向量化的回测允许灵活、互动式地探索交易策略及其特征。通常只需几行代码即可得出初步结果，并且可以轻松测试不同的参数组合。

以可视化为主要目标

NumPy 中的向量化

数值计算的NumPy包（详见NumPy主页）将向量化引入 Python。NumPy提供的主要类是ndarray类，代表n 维数组。例如，可以基于list对象v创建这样一个对象的实例。标量乘法、线性变换以及类似的线性代数操作将按预期进行：

In [5]: import numpy as np ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [6]: a = np.array(v) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [7]: a ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[7]: array([1, 2, 3, 4, 5])In [8]: type(a) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[8]: numpy.ndarrayIn [9]: 2 * a ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[9]: array([ 2, 4, 6, 8, 10])In [10]: 0.5 * a + 2 ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[10]: array([2.5, 3. , 3.5, 4. , 4.5])

导入NumPy包。

基于list对象实例化一个ndarray对象。

以ndarray对象的形式打印存储的数据。

查找对象的类型。

实现矢量化的标量乘法。

实现矢量化的线性变换。

从一维数组（向量）到二维数组（矩阵）的转换是自然的。对于更高维度也同样适用：

In [11]: a = np.arange(12).reshape((4, 3)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [12]: aOut[12]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])In [13]: 2 * aOut[13]: array([[ 0, 2, 4], [ 6, 8, 10], [12, 14, 16], [18, 20, 22]])In [14]: a ** 2 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[14]: array([[ 0, 1, 4], [ 9, 16, 25], [ 36, 49, 64], [ 81, 100, 121]])

创建一个一维ndarray对象，并将其重新整形为二维。

以向量化方式计算对象的每个元素的平方。

此外，ndarray类提供了一些方法，允许进行向量化操作。它们通常也有对应的称为通用函数的形式，由NumPy提供：

In [15]: a.mean() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[15]: 5.5In [16]: np.mean(a) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[16]: 5.5In [17]: a.mean(axis=0) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[17]: array([4.5, 5.5, 6.5])In [18]: np.mean(a, axis=1) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[18]: array([ 1., 4., 7., 10.])

通过方法调用计算所有元素的平均值。

通过通用函数计算所有元素的平均值。

沿第一轴计算平均值。

沿第二轴计算平均值。

作为金融示例，考虑在“Python Scripts”中使用欧拉离散化生成几何布朗运动样本路径的函数generate_sample_data()。该实现利用了多个向量化操作，将它们组合成一行代码。

更多关于NumPy中向量化的细节，请参见附录A。有关在金融背景下向量化应用的多种示例，请参考 Hilpisch（2018）。

Python 的标准指令集和数据模型通常不允许进行向量化数值操作。NumPy引入了基于常规数组类ndarray的强大向量化技术，使得例如在线性代数中关于向量和矩阵的数学符号的紧凑代码成为可能。

使用`pandas`进行向量化

pandas包和中心DataFrame类大量使用NumPy和ndarray类。因此，NumPy上下文中看到的大多数向量化原则也适用于pandas。这些机制最好通过具体示例再次解释。首先，定义一个二维ndarray对象：

In [19]: a = np.arange(15).reshape(5, 3)In [20]: aOut[20]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]])

要创建DataFrame对象，首先生成一个具有适当大小的列名list对象和一个DatetimeIndex对象，两者均适合给定的ndarray对象：

In [21]: import pandas as pd ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [22]: columns = list('abc') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [23]: columnsOut[23]: ['a', 'b', 'c']In [24]: index = pd.date_range('2021-7-1', periods=5, freq='B') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [25]: indexOut[25]: DatetimeIndex(['2021-07-01', '2021-07-02', '2021-07-05', '2021-07-06', '2021-07-07'], dtype='datetime64[ns]', freq='B')In [26]: df = pd.DataFrame(a, columns=columns, index=index) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [27]: dfOut[27]: a b c 2021-07-01 0 1 2 2021-07-02 3 4 5 2021-07-05 6 7 8 2021-07-06 9 10 11 2021-07-07 12 13 14

导入pandas包。

从str对象创建一个list对象。

创建一个具有“工作日”频率并覆盖五个周期的pandas DatetimeIndex对象。

基于 ndarray 对象 a 实例化一个 DataFrame 对象，指定列标签和索引值。

原则上，向量化现在与 ndarray 对象类似工作。一个不同之处是聚合操作默认为按列结果：

In [28]: 2 * df ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[28]: a b c 2021-07-01 0 2 4 2021-07-02 6 8 10 2021-07-05 12 14 16 2021-07-06 18 20 22 2021-07-07 24 26 28In [29]: df.sum() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[29]: a 30 b 35 c 40 dtype: int64In [30]: np.mean(df) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[30]: a 6.0 b 7.0 c 8.0 dtype: float64

计算 DataFrame 对象（视为矩阵）的数量积。

计算每列的求和。

计算每列的均值。

通过方括号或点符号引用相应列名，可以实现按列的操作：

In [31]: df['a'] + df['c'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[31]: 2021-07-01 2 2021-07-02 8 2021-07-05 14 2021-07-06 20 2021-07-07 26 Freq: B, dtype: int64In [32]: 0.5 * df.a + 2 * df.b - df.c ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[32]: 2021-07-01 0.0 2021-07-02 4.5 2021-07-05 9.0 2021-07-06 13.5 2021-07-07 18.0 Freq: B, dtype: float64

计算列 a 和 c 的逐元素求和。

计算涉及所有三列的线性变换。

同样地，根据布尔结果向量生成条件和类似 SQL 的选择条件也很容易实现：

In [33]: df['a'] > 5 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[33]: 2021-07-01 False 2021-07-02 False 2021-07-05 True 2021-07-06 True 2021-07-07 True Freq: B, Name: a, dtype: boolIn [34]: df[df['a'] > 5] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[34]: a b c 2021-07-05 6 7 8 2021-07-06 9 10 11 2021-07-07 12 13 14

列 a 中哪个元素大于五？

选择所有列 a 中元素大于五的行。

对于向量化回测交易策略，比较两列或更多列是典型的操作：

In [35]: df['c'] > df['b'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[35]: 2021-07-01 True 2021-07-02 True 2021-07-05 True 2021-07-06 True 2021-07-07 True Freq: B, dtype: boolIn [36]: 0.15 * df.a + df.b > df.c ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[36]: 2021-07-01 False 2021-07-02 False 2021-07-05 False 2021-07-06 True 2021-07-07 True Freq: B, dtype: bool

对于哪个日期，列 c 中的元素大于列 b 中的元素？

将列 a 和 b 的线性组合与列 c 进行比较的条件。

使用 pandas 进行向量化是一个强大的概念，特别适用于金融算法的实现和向量化回测，如本章节其余部分所示。有关使用 pandas 进行向量化的基础知识和金融示例，请参阅 Hilpisch（2018 年，第五章）。

虽然 NumPy 将一般向量化方法引入了 Python 的数值计算世界，但 pandas 允许在时间序列数据上进行向量化。这对于实施金融算法和算法交易策略的回测非常有帮助。通过这种方法，您可以期待更简洁的代码，并且与使用标准 Python 代码（使用 for 循环等惯用语法）相比，代码执行速度更快。

基于简单移动平均线（SMAs）的交易策略是一个源于技术股票分析领域几十年历史的策略。例如，Brock 等人（1992 年）系统地对这些策略进行了实证研究。他们写道：

“技术分析”这个术语是指多种交易技术的总称。在本文中，我们探讨了两种最简单和最流行的技术规则：移动平均振荡器和交易区间突破（阻力和支撑水平）。在第一种方法中，通过两个移动平均线（长期和短期）生成买入和卖出信号。我们的研究表明，技术分析有助于预测股票的变化。

入门基础

本小节重点介绍了使用两个 SMA 的回测交易策略的基础知识。后续示例使用了 EUR/USD 汇率的每日收盘数据，这些数据可以在 EOD 数据文件中的 csv 文件中找到。数据来自 Refinitiv Eikon 数据 API，代表了各自工具的 EOD 值 (RICs)：

In [37]: raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [38]: raw.info() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AAPL.O 2516 non-null float64 1 MSFT.O 2516 non-null float64 2 INTC.O 2516 non-null float64 3 AMZN.O 2516 non-null float64 4 GS.N 2516 non-null float64 5 SPY 2516 non-null float64 6 .SPX 2516 non-null float64 7 .VIX 2516 non-null float64 8 EUR= 2516 non-null float64 9 XAU= 2516 non-null float64 10 GDX 2516 non-null float64 11 GLD 2516 non-null float64 dtypes: float64(12) memory usage: 255.5 KBIn [39]: data = pd.DataFrame(raw['EUR=']) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [40]: data.rename(columns={'EUR=': 'price'}, inplace=True) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [41]: data.info() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price 2516 non-null float64 dtypes: float64(1) memory usage: 39.3 KB

从远程存储的 CSV 文件中读取数据。

显示 DataFrame 对象的元信息。

将 Series 对象转换为 DataFrame 对象。

将唯一的列重命名为 price。

显示新 DataFrame 对象的元信息。

使用 rolling() 方法和延迟计算操作，简化了 SMA 的计算：

In [42]: data['SMA1'] = data['price'].rolling(42).mean() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [43]: data['SMA2'] = data['price'].rolling(252).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [44]: data.tail() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[44]: price SMA1 SMA2 Date 2019-12-24 1.1087 1.107698 1.119630 2019-12-26 1.1096 1.107740 1.119529 2019-12-27 1.1175 1.107924 1.119428 2019-12-30 1.1197 1.108131 1.119333 2019-12-31 1.1210 1.108279 1.119231

创建一个包含 42 天 SMA 值的列。前 41 个值将为 NaN。

创建一个包含 252 天 SMA 值的列。前 251 个值将为 NaN。

打印数据集的最后五行。

通过原始时间序列数据与 SMA 的组合可视化，最佳地展示了结果（见图4-1）：

In [45]: %matplotlib inline from pylab import mpl, plt plt.style.use('seaborn') mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'serif'In [46]: data.plot(title='EUR/USD | 42 & 252 days SMAs', figsize=(10, 6));

下一步是根据两个 SMA 之间的关系生成信号或市场定位。规则是当较短的 SMA 高于较长的 SMA 时做多，反之则做空。对于我们的目的，我们用 1 表示做多仓位，用 -1 表示做空仓位。

图 4-1. EUR/USD 汇率与两个 SMA

能够直接比较 DataFrame 对象的两列使得这一规则的实现仅需一行代码。随时间变化的定位在图4-2 中有所体现：

In [47]: data['position'] = np.where(data['SMA1'] > data['SMA2'], 1, -1) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [48]: data.dropna(inplace=True) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [49]: data['position'].plot(ylim=[-1.1, 1.1], title='Market Positioning', figsize=(10, 6)); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

以向量化方式实现交易规则。np.where() 在表达式为 True 的行上生成 +1，在表达式为 False 的行上生成 -1。

删除数据集中至少包含一个NaN值的所有行。

绘制随时间变化的定位。

图4-2. 基于两个 SMA 的策略市场定位

计算策略表现时，接下来基于原始金融时间序列计算对数收益率。由于矢量化，代码相当简洁。图4-3 显示了对数收益率的直方图：

In [50]: data['returns'] = np.log(data['price'] / data['price'].shift(1)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [51]: data['returns'].hist(bins=35, figsize=(10, 6)); ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)

在price列上以矢量化方式计算对数收益率。

将对数收益率绘制成直方图（频率分布）。

要得出策略收益，将position列（向后移一交易日）与returns列相乘。由于对数收益率是可加的，计算returns和strategy列的总和提供了策略相对于基础投资本身表现的首次比较。

图4-3. EUR/USD 对数收益率的频率分布

比较收益率表明，该策略在超过被动基准投资方面获得了胜利：

In [52]: data['strategy'] = data['position'].shift(1) * data['returns'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [53]: data[['returns', 'strategy']].sum() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[53]: returns -0.176731 strategy 0.253121 dtype: float64In [54]: data[['returns', 'strategy']].sum().apply(np.exp) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[54]: returns 0.838006 strategy 1.288039 dtype: float64

根据仓位和市场回报计算策略的对数收益率。

对股票和策略的单个对数收益值求和（仅作说明）。

对对数收益率之和应用指数函数以计算总体表现。

使用c*msum计算随时间的累积和，基于此应用指数函数np.exp()计算累积收益，这提供了策略如何随时间与基础金融工具表现相比的更全面的图像。图4-4 图形化显示了数据，并展示了在这种特定情况下的超额表现：

In [55]: data[['returns', 'strategy']].c*msum( ).apply(np.exp).plot(figsize=(10, 6));

图4-4. EUR/USD 与基于 SMA 的策略的总体表现对比

股票和策略的平均年化风险收益统计很容易计算：

In [56]: data[['returns', 'strategy']].mean() * 252 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[56]: returns -0.019671 strategy 0.028174 dtype: float64In [57]: np.exp(data[['returns', 'strategy']].mean() * 252) - 1 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[57]: returns -0.019479 strategy 0.028575 dtype: float64In [58]: data[['returns', 'strategy']].std() * 252 ** 0.5 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[58]: returns 0.085414 strategy 0.085405 dtype: float64In [59]: (data[['returns', 'strategy']].apply(np.exp) - 1).std() * 252 ** 0.5 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[59]: returns 0.085405 strategy 0.085373 dtype: float64

计算对数空间和常规空间中的年化均值收益。

计算对数空间和常规空间中的年化标准偏差。

在交易策略表现的背景下，其他经常感兴趣的风险统计数据包括最大回撤和最长回撤期。在此背景下使用的一个辅助统计数据是策略的总体最大累积毛利润，由应用于策略毛利润的cummax()方法计算。图 4-5 展示了基于 SMA 策略的两个时间序列：

In [60]: data['cumret'] = data['strategy'].c*msum().apply(np.exp) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [61]: data['cummax'] = data['cumret'].cummax() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [62]: data[['cumret', 'cummax']].dropna().plot(figsize=(10, 6)); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

定义一个新列cumret，其中包含随时间变化的策略毛利润。

定义另一个列，其中包含策略毛利润的运行最大值。

绘制DataFrame对象的两个新列。

图 4-5. 基于 SMA 策略的毛利润和累积最大表现

然后，最大回撤简单地计算为两个相关列之间的差异的最大值。在例子中，最大回撤约为 18 个百分点：

In [63]: drawdown = data['cummax'] - data['cumret'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [64]: drawdown.max() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[64]: 0.17779367070195917

计算两列之间的逐元素差异。

从所有差异中挑选出最大值。

确定最长回撤期需要更多的工作。它需要找到那些总体表现等于其累积最大值的日期（即设置新最大值的日期）。这些信息存储在一个临时对象中。然后计算所有这些日期之间的天数差异，并选择最长的期间。这样的期间可能只有一天，也可能超过 100 天。在这里，最长的回撤期为 596 天——一个相当长的时间段：²

In [65]: temp = drawdown[drawdown == 0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [66]: periods = (temp.index[1:].to_pydatetime() - temp.index[:-1].to_pydatetime()) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [67]: periods[12:15]Out[67]: array([datetime.timedelta(days=1), datetime.timedelta(days=1), datetime.timedelta(days=10)], dtype=object)In [68]: periods.max() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[68]: datetime.timedelta(days=596)

哪些差异等于零？

计算所有索引值之间的timedelta值。

挑选出最大的timedelta值。

使用pandas进行向量化回测通常是一项相当高效的工作，这归功于该库的能力以及主要的DataFrame类。然而，迄今为止展示的交互式方法在希望实施更大的回测程序时效果不佳，例如优化基于 SMA 策略的参数。因此，建议采用更一般化的方法。

pandas 显然是进行交易策略向量化分析的强大工具。可以通过一行或几行代码计算许多感兴趣的统计数据，如对数收益、累积收益、年化收益和波动率、最大回撤及最大回撤期。通过简单的方法调用能够可视化结果是一个额外的好处。

泛化方法

“SMA 回测类” 提供了一个包含用于基于 SMA 的交易策略向量化回测的类的 Python 代码。在某种意义上，它是前一小节介绍的方法的一种泛化。它允许通过提供以下参数来定义 SMAVectorBacktester 类的实例：

symbol: 用于的 RIC（仪器数据）
SMA1: 较短 SMA 的时间窗口（天数）
SMA2: 较长 SMA 的时间窗口（天数）
start: 数据选择的开始日期
end: 数据选择的结束日期

应用程序本身最好通过使用该类的交互会话来说明。例如，首先复制之前基于 EUR/USD 汇率数据实施的回测。然后优化 SMA 参数以获取最大总体表现。基于最佳参数，它绘制了策略与基准工具在相关时间段内的总体表现：

In [69]: import SMAVectorBacktester as SMA ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [70]: smabt = SMA.SMAVectorBacktester('EUR=', 42, 252, '2010-1-1', '2019-12-31') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [71]: smabt.run_strategy() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[71]: (1.29, 0.45)In [72]: %%time smabt.optimize_parameters((30, 50, 2), (200, 300, 2)) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) CPU times: user 3.76 s, sys: 15.8 ms, total: 3.78 s Wall time: 3.78 sOut[72]: (array([ 48., 238.]), 1.5)In [73]: smabt.plot_results() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)

这将模块导入为 SMA。

实例化主类的一个实例。

对给定实例化参数的 SMA 策略进行回测。

optimize_parameters() 方法接受参数范围和步长作为输入，并通过蛮力方法确定最佳组合。

plot_results() 方法根据当前存储的参数值（这里是优化过程的参数）绘制策略表现与基准工具的对比图。

原始参数化策略的总体表现为 1.24 或 124%。优化策略对于参数组合 SMA1 = 48 和 SMA2 = 238 的绝对回报率为 1.44 或 144%。图4-6 以图形方式显示了随时间变化的总体表现，再次与代表基准的基础工具的表现进行比较。

图4-6. EUR/USD 的总体表现及优化的 SMA 策略

有两种基本类型的动量策略。第一种类型是交叉部门动量策略。从更大的仪器池中选择，这些策略购买那些相对于同行（或基准）最近表现优越的仪器，并卖出那些表现不佳的仪器。基本思想是，这些仪器在一定时间内继续表现优越或不足。Jegadeesh 和 Titman（1993, 2001）以及 Chan 等人（1996）研究了这些类型的交易策略及其潜在的盈利来源。

交叉部门动量策略传统上表现相当不错。Jegadeesh 和 Titman（1993）写道：

本文记录了那些在过去表现良好的股票购买和在过去表现不佳的股票卖出的策略，在 3 至 12 个月的持有期内产生显著的正回报。

第二种类型是时间序列动量策略。这些策略购买那些最近表现良好的仪器，并卖出那些最近表现不佳的仪器。在这种情况下，基准是仪器本身的过去回报。Moskowitz 等人（2012）详细分析了各种市场中这种类型的动量策略。他们写道：

与关注证券在横截面上的相对回报不同，时间序列动量完全集中于安全本身的过去回报……我们发现，在我们研究的几乎每种仪器中，时间序列动量似乎挑战了“随机行走”假设，该假设的基本形式暗示了过去价格的上涨或下跌不应该对其未来的上涨或下跌具有信息性。

入门基础

考虑美元下的黄金收盘价（XAU=）：

In [74]: data = pd.DataFrame(raw['XAU='])In [75]: data.rename(columns={'XAU=': 'price'}, inplace=True)In [76]: data['returns'] = np.log(data['price'] / data['price'].shift(1))

最简单的时间序列动量策略是，如果最后的回报是正的，就买入股票；如果是负的，就卖出。利用NumPy和pandas可以很容易地进行形式化；只需将最后可用回报的符号作为市场头寸。图4-7 展示了该策略的表现。该策略明显表现不佳基准工具：

In [77]: data['position'] = np.sign(data['returns']) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [78]: data['strategy'] = data['position'].shift(1) * data['returns'] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [79]: data[['returns', 'strategy']].dropna().c*msum( ).apply(np.exp).plot(figsize=(10, 6)); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

定义一个新列，包含相关对数回报的符号（即 1 或-1）；结果值表示市场头寸（多头或空头）。

根据市场头寸计算策略的对数回报。

绘制并比较策略的表现与基准工具。

图4-7. 黄金价格（美元）和动量策略（仅最后回报）的总体表现

使用滚动时间窗口，时间序列动量策略可以泛化到不仅仅是最后一个回报。例如，可以使用最后三次回报的平均值生成定位信号。图4-8 表明，在这种情况下，该策略的表现要好得多，无论是绝对还是相对于基准工具：

In [80]: data['position'] = np.sign(data['returns'].rolling(3).mean()) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [81]: data['strategy'] = data['position'].shift(1) * data['returns']In [82]: data[['returns', 'strategy']].dropna().c*msum( ).apply(np.exp).plot(figsize=(10, 6));

这次采用了连续三天滚动窗口的平均回报。

然而，这种表现对于时间窗口参数非常敏感。例如，选择最后两次回报而不是三次会导致表现大幅下降，正如图4-9 所示。

图 4-8. 黄金价格（USD）和动量策略（最后三次回报）的总体表现

图 4-9. 黄金价格（USD）和动量策略（最近两次回报）的总体表现

时间序列动量可能在一天中预期出现，实际上，人们预计它在一天内比在多天内更为显著。图4-10 展示了五种时间序列动量策略的总体表现，分别为一、三、五、七和九次回报观察。所使用的数据是从 Eikon 数据 API 检索的苹果公司的一天内股价数据。该图基于以下代码。基本上所有策略在这一天内的时间窗口内都表现优于股票，尽管有些只是稍微好一些：

In [83]: fn = '../data/AAPL_1min_05052020.csv' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) # fn = '../data/SPX_1min_05052020.csv' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [84]: data = pd.read_csv(fn, index_col=0, parse_dates=True) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [85]: data.info() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 241 entries, 2020-05-05 16:00:00 to 2020-05-05 20:00:00 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 HIGH 241 non-null float64 1 LOW 241 non-null float64 2 OPEN 241 non-null float64 3 CLOSE 241 non-null float64 4 COUNT 241 non-null float64 5 VOLUME 241 non-null float64 dtypes: float64(6) memory usage: 13.2 KBIn [86]: data['returns'] = np.log(data['CLOSE'] / data['CLOSE'].shift(1)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [87]: to_plot = ['returns'] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [88]: for m in [1, 3, 5, 7, 9]: data['position_%d' % m] = np.sign(data['returns'].rolling(m).mean()) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) data['strategy_%d' % m] = (data['position_%d' % m].shift(1) * data['returns']) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) to_plot.append('strategy_%d' % m) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [89]: data[to_plot].dropna().c*msum().apply(np.exp).plot( title='AAPL intraday 05\. May 2020', figsize=(10, 6), style=['-', '--', '--', '--', '--', '--']); ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)

从CSV文件中读取日内数据。

计算日内对数回报。

定义一个list对象以选择后续要绘制的列。

根据动量策略参数推导出定位。

计算所得的策略对数回报。

向list对象附加列名。

将所有相关列绘制出来，以比较策略的表现与基准工具的表现。

图 4-10. 苹果股票和五种动量策略（最后一次、三次、五次、七次和九次回报）的总体日内表现

图4-11 展示了相同五种策略在标准普尔 500 指数上的表现。同样，所有五种策略配置都优于指数，且显示正回报（扣除交易成本前）。

图 4-11. 标准普尔 500 指数和五种动量策略（最后一次、三次、五次、七次和九次回报）的总体日内表现

泛化方法

“动量回测类”介绍了一个包含MomVectorBacktester类的 Python 模块，允许更加标准化地对基于动量的策略进行回测。该类具有以下属性：

symbol: 用于的RIC（工具数据）
start: 数据选择的开始日期
end: 数据选择的结束日期
amount: 初始投资金额
tc: 每笔交易的比例交易成本

与SMAVectorBacktester类相比，该类引入了两个重要的泛化特性：在回测期间初始投资的固定金额和比例交易成本，以更接近市场实际成本。特别是在时间序列动量策略的背景下，交易成本的增加尤为重要，这些策略往往随时间产生大量交易。

应用程序与以往一样简单方便。示例首先复制了以前交互会话的结果，但这次的初始投资为 10,000 美元。图4-12 展示了该策略的表现，使用最后三次收益均值生成定位信号。第二种情况是每笔交易的比例交易成本为 0.1%。如图4-13 所示，在这种情况下，即使是较小的交易成本也会显著降低绩效。影响的驱动因素是策略需要的交易频率相对较高：

In [90]: import MomVectorBacktester as Mom ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [91]: mombt = Mom.MomVectorBacktester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.0) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [92]: mombt.run_strategy(momentum=3) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[92]: (20797.87, 7395.53)In [93]: mombt.plot_results()In [94]: mombt = Mom.MomVectorBacktester('XAU=', '2010-1-1', '2019-12-31', 10000, 0.001) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [95]: mombt.run_strategy(momentum=3) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[95]: (10749.4, -2652.93)In [96]: mombt.plot_results()

将模块导入为Mom

实例化一个回测类的对象，定义起始资本为 10,000 美元，比例交易成本为零。

对基于三天时间窗口的动量策略进行回测：该策略优于基准被动投资。

这次假设每笔交易的比例交易成本为 0.1%。

在这种情况下，该策略基本上失去了所有超额绩效。

图4-12. 黄金价格（美元）和动量策略的总体表现（最后三次收益，无交易成本）

图4-13. 黄金价格（美元）和动量策略的总体表现（最后三次收益，交易成本为 0.1%）

粗略来说，均值回归策略依赖于与动量策略相反的推理。如果某个金融工具相对于其趋势表现“过好”，则会做空，反之亦然。换句话说，动量策略（时间序列）假设收益之间有正相关性，而均值回归策略则假设收益之间有负相关性。Balvers 等人（2000）写道：

均值回归是指资产价格返回到趋势路径的倾向。

使用简单移动平均线（SMA）作为“趋势路径”的代理，例如 EUR/USD 汇率的均值回归策略可以以类似 SMA 和动量策略的回测方式进行。其思想是定义当前股票价格与 SMA 之间的距离阈值，以指示开仓或平仓位置。

入门基础

后续示例针对两种不同的金融工具，由于它们都基于金价，可以预期有显著的均值回归：

GLD 是 SPDR 黄金股票的符号，这是最大的实物支持交易所交易基金（ETF）黄金（参见 SPDR Gold Shares 主页）。
GDX 是 VanEck Vectors 黄金矿业 ETF 的符号，该基金投资于股票产品以跟踪 NYSE Arca 黄金矿业指数（参见 VanEck Vectors Gold Miners 概述页面）。

示例从 GDX 开始，并基于 25 天 SMA 和绝对偏差阈值为 3.5 实施均值回归策略，以信号化定位。图4-14 显示了 GDX 的当前价格与 SMA 之间的差异，以及生成卖出和买入信号的正负阈值：

In [97]: data = pd.DataFrame(raw['GDX'])In [98]: data.rename(columns={'GDX': 'price'}, inplace=True)In [99]: data['returns'] = np.log(data['price'] / data['price'].shift(1))In [100]: SMA = 25 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [101]: data['SMA'] = data['price'].rolling(SMA).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [102]: threshold = 3.5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [103]: data['distance'] = data['price'] - data['SMA'] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [104]: data['distance'].dropna().plot(figsize=(10, 6), legend=True) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) plt.axhline(threshold, color='r') plt.axhline(-threshold, color='r') plt.axhline(0, color='r');

SMA 参数被定义为…

…并且 SMA（“趋势路径”）被计算。

信号生成的阈值被定义。

对于每个时间点计算距离。

绘制距离数值。

图 4-14. `GDX` 当前价格与 SMA 之间的差异，以及生成均值回归信号的阈值数值

基于差异和固定阈值，可以再次以向量化方式推导出定位。图4-15 显示了生成的定位：

In [105]: data['position'] = np.where(data['distance'] > threshold, -1, np.nan) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [106]: data['position'] = np.where(data['distance'] < -threshold, 1, data['position']) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [107]: data['position'] = np.where(data['distance'] * data['distance'].shift(1) < 0, 0, data['position']) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [108]: data['position'] = data['position'].ffill().fillna(0) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [109]: data['position'].iloc[SMA:].plot(ylim=[-1.1, 1.1], figsize=(10, 6)); ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)

如果距离值大于阈值，则进行做空操作（在新列 position 中设置为 -1），否则设置为 NaN。

如果距离值低于负阈值，则做多（设置为 1），否则保持position列不变。

如果距离值的符号发生变化，则市场中性（设置为 0），否则保持position列不变。

前向填充所有NaN位置的值为前一个值；将所有剩余的NaN值替换为 0。

从索引位置SMA开始绘制结果定位。

图 4-15. 基于均值回归策略生成的`GDX`的定位

最后一步是推导策略收益，显示在图 4-16 中。尽管特定的参数设置导致中性位置的长期存在（既非多也非空仓），但这些中性位置反映在图 4-16 策略曲线的平坦部分：

In [110]: data['strategy'] = data['position'].shift(1) * data['returns']In [111]: data[['returns', 'strategy']].dropna().c*msum( ).apply(np.exp).plot(figsize=(10, 6));

图 4-16. `GDX` ETF 和均值回归策略的总体表现（SMA = 25，阈值 = 3.5）

泛化方法

与以往一样，基于相应 Python 类实现的向量化回测更高效。介绍的MRVectorBacktester类继承自MomVectorBacktester类，只需替换run_strategy()方法以适应均值回归策略的特定要求。

本例现在使用GLD，并将比例交易成本设置为 0.1%。再次将初始投资金额设定为 10,000 美元。这次的 SMA 为 43，阈值为 7.5。图 4-17 显示了均值回归策略与GLD ETF 的表现对比：

In [112]: import MRVectorBacktester as MR ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [113]: mrbt = MR.MRVectorBacktester('GLD', '2010-1-1', '2019-12-31', 10000, 0.001) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [114]: mrbt.run_strategy(SMA=43, threshold=7.5) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[114]: (13542.15, 646.21)In [115]: mrbt.plot_results() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

将模块导入为MR。

用 10,000 美元的初始资本和每笔交易 0.1% 的比例交易成本实例化MRVectorBacktester类的对象；在这种情况下，策略明显优于基准工具。

使用SMA值为 43 和threshold值为 7.5 进行均值回归策略的回测。

绘制策略累积表现与基础工具的图表。

图 4-17. `GLD` ETF 和均值回归策略的总体表现（SMA = 43，阈值 = 7.5，交易成本为 0.1%）

这一章节，以及本书的其余部分，侧重于使用 Python 在算法交易中重要概念的技术实现。有时选择的策略、参数、数据集和算法是任意的，有时则是有目的地选择以突出某一点。毫无疑问，在讨论应用于金融的技术方法时，看到展示“良好结果”的示例更令人兴奋和有动力，即使这些结果可能不适用于其他金融工具或时间段。

展示具有良好结果的示例的能力通常伴随着数据窥探的代价。根据 White（2000）的说法，数据窥探可以定义如下：

数据窥探发生在给定数据集被用于推断或模型选择的目的超过一次时。

换句话说，某种方法可能会多次或甚至多次应用于同一数据集，以获得令人满意的数字和图表。当然，在交易策略研究中，这种做法在智力上是不诚实的，因为它假装一种交易策略在现实世界中具有某种经济潜力，这可能并不现实。由于本书的重点是将 Python 作为算法交易的编程语言使用，数据窥探的方法可能是合理的。这类似于数学书籍通过例子解决一个具有唯一解且易于识别的方程。在数学中，这样的简单例子通常是例外而非规则，但它们经常用于教学目的。

在这种情况下出现的另一个问题是过度拟合。在交易背景下，过度拟合可以描述如下（参见曼斯特理论关于过度拟合）：

过度拟合是指模型描述噪声而非信号的情况。该模型可能在测试数据上表现良好，但在未来的新数据上几乎没有预测能力。过度拟合可以描述为找到实际上并不存在的模式。过度拟合会带来成本——一个过度拟合的策略未来表现将不佳。

即使是基于两个 SMA 值的简单策略也允许进行数千种不同参数组合的回测。其中一些组合几乎肯定会显示良好的性能结果。正如 Bailey 等人（2015）详细讨论的那样，这很容易导致回测过度拟合，而负责回测的人员甚至可能都没有意识到这个问题。他们指出：

算法研究和高性能计算的最新进展使得在有限的金融时间序列数据集上测试数百万甚至数十亿个替代投资策略几乎变得微不足道……[I]常见做法是利用这种计算能力来校准投资策略的参数，以最大化其绩效。但由于信号与噪声比例如此之低，往往这种校准的结果是选择参数以从过去的噪声中获利，而不是未来的信号。结果是一个过拟合的回测。

在统计意义上，经验证的实证结果的有效性问题当然不仅限于金融背景下的策略回测。

Ioannidis（2005 年）在谈及医学出版物时，强调在评估研究结果的再现性和有效性时的概率和统计考量：

越来越多的人担心，在现代研究中，虚假发现可能占到或甚至是绝大多数已发表的研究声明。然而，这并不奇怪。可以证明，大多数声称的研究发现是错误的……正如先前所示，研究发现的真实概率取决于研究之前它的真实先验概率，研究的统计功效，以及统计显著性水平。

在这种背景下，如果本书中的某个交易策略在给定某个数据集、参数组合，以及可能的特定机器学习算法下表现良好，这既不构成对特定配置的任何推荐，也不允许对策略配置的质量和绩效潜力做出更一般性的结论。

当然，鼓励您使用本书中提供的代码和示例，探索您自己的算法交易策略思路，并根据您自己的回测结果、验证和结论来实际实施它们。毕竟，正确而勤奋的策略研究是金融市场会给予回报的，而不是仅仅依赖数据挖掘和过拟合。

向量化是科学计算和金融分析中的一个强大概念，在算法交易策略回测的背景下尤为重要。本章介绍了使用NumPy和pandas进行向量化，并将其应用于回测三种类型的交易策略：基于简单移动平均线、动量和均值回归的策略。本章承认做了一些简化的假设，严格的交易策略回测需要考虑更多因素，如数据问题、选择问题、避免过度拟合或市场微观结构元素。然而，本章的主要目标是集中讨论向量化的概念及其在算法交易中的技术和实施视角下的应用。在所有具体例子和结果中，需要考虑数据窥探、过度拟合和统计显著性的问题。

有关使用NumPy和pandas进行向量化的基础知识，请参阅以下书籍：

McKinney, Wes. 2017. Python 数据分析. 第二版. Sebastopol: O’Reilly.
VanderPlas, Jake. 2016. Python 数据科学手册. Sebastopol: O’Reilly.

有关在金融背景下使用NumPy和pandas，请参阅以下书籍：

Hilpisch, Yves. 2015. Python 金融衍生品分析：数据分析、模型、仿真、校准及对冲. Wiley Finance.
⸻. 2017. 列出的波动率和方差衍生品：基于 Python 的指南. Wiley Finance.
⸻. 2018. Python 金融：掌握数据驱动的金融. 第二版. Sebastopol: O’Reilly.

关于数据窥探和过度拟合的主题，请参阅以下论文：

Bailey, David, Jonathan Borwein, Marcos López de Prado 和 Qiji Jim Zhu. 2015. “回测过拟合的概率.” 计算金融学杂志 20, (4): 39-69. https://oreil.ly/sOHlf.
Ioannidis, John. 2005. “为什么大多数发表的研究结果是错误的.” PLoS Medicine 2, (8): 696-701.
White, Halbert. 2000. “数据窥探的现实检验.” 计量经济学 68, (5): 1097-1126.

关于基于简单移动平均线的交易策略的背景信息和实证结果，请参阅以下来源：

Brock, William, Josef Lakonishok 和 Blake LeBaron. 1992. “简单技术交易规则和股票回报的随机特性.” 金融学杂志 47, (5): 1731-1764.
Droke, Clif. 2001. 简化移动平均线. Columbia: Marketplace Books.

Ernest Chan 的书详细介绍了基于动量和均值回归的交易策略。该书也是回测交易策略陷阱的良好资源：

Chan, Ernest. 2013. 算法交易：胜利策略及其原理. Hoboken et al: John Wiley & Sons.

这些研究论文分析了横截面动量策略的特征和利润来源，这是动量交易的传统方法：

Chan, Louis, Narasimhan Jegadeesh, and Josef Lakonishok. 1996. “Momentum Strategies.” Journal of Finance 51, (5): 1681-1713.
Jegadeesh, Narasimhan, and Sheridan Titman. 1993. “Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency.” Journal of Finance 48, (1): 65-91.
Jegadeesh, Narasimhan, and Sheridan Titman. 2001. “Profitability of Momentum Strategies: An Evaluation of Alternative Explanations.” Journal of Finance 56, (2): 599-720.

Moskowitz 等人的论文提供了所谓的时间序列动量策略的分析：

Moskowitz, Tobias, Yao Hua Ooi, and Lasse Heje Pedersen. 2012. “Time Series Momentum.” Journal of Financial Economics 104: 228-250.

这些论文从经验上分析了资产价格的均值回归：

Balvers, Ronald, Yangru Wu, and Erik Gilliland. 2000. “Mean Reversion across National Stock Markets and Parametric Contrarian Investment Strategies.” Journal of Finance 55, (2): 745-772.
Kim, Myung Jig, Charles Nelson, and Richard Startz. 1991. “Mean Reversion in Stock Prices? A Reappraisal of the Empirical Evidence.” Review of Economic Studies 58: 515-528.
Spierdijk, Laura, Jacob Bikker, and Peter van den Hoek. 2012. “Mean Reversion in International Stock Markets: An Empirical Analysis of the 20th Century.” Journal of International Money and Finance 31: 228-249.

本节介绍了本章引用并使用的 Python 脚本。

SMA 回测类

以下是具有基于简单移动平均的策略的向量化回测的类的 Python 代码：

## Python Module with Class# for Vectorized Backtesting# of SMA-based Strategies## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdfrom scipy.optimize import bruteclass SMAVectorBacktester(object): ''' Class for the vectorized backtesting of SMA-based trading strategies. Attributes ========== symbol: str RIC symbol with which to work SMA1: int time window in days for shorter SMA SMA2: int time window in days for longer SMA start: str start date for data retrieval end: str end date for data retrieval Methods ======= get_data: retrieves and prepares the base data set set_parameters: sets one or two new SMA parameters run_strategy: runs the backtest for the SMA-based strategy plot_results: plots the performance of the strategy compared to the symbol update_and_run: updates SMA parameters and returns the (negative) absolute performance optimize_parameters: implements a brute force optimization for the two SMA parameters ''' def __init__(self, symbol, SMA1, SMA2, start, end): self.symbol = symbol self.SMA1 = SMA1 self.SMA2 = SMA2 self.start = start self.end = end self.results = None self.get_data() def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['return'] = np.log(raw / raw.shift(1)) raw['SMA1'] = raw['price'].rolling(self.SMA1).mean() raw['SMA2'] = raw['price'].rolling(self.SMA2).mean() self.data = raw def set_parameters(self, SMA1=None, SMA2=None): ''' Updates SMA parameters and resp. time series. ''' if SMA1 is not None: self.SMA1 = SMA1 self.data['SMA1'] = self.data['price'].rolling( self.SMA1).mean() if SMA2 is not None: self.SMA2 = SMA2 self.data['SMA2'] = self.data['price'].rolling(self.SMA2).mean() def run_strategy(self): ''' Backtests the trading strategy. ''' data = self.data.copy().dropna() data['position'] = np.where(data['SMA1'] > data['SMA2'], 1, -1) data['strategy'] = data['position'].shift(1) * data['return'] data.dropna(inplace=True) data['creturns'] = data['return'].c*msum().apply(np.exp) data['cstrategy'] = data['strategy'].c*msum().apply(np.exp) self.results = data # gross performance of the strategy aperf = data['cstrategy'].iloc[-1] # out-/underperformance of strategy operf = aperf - data['creturns'].iloc[-1] return round(aperf, 2), round(operf, 2) def plot_results(self): ''' Plots the cumulative performance of the trading strategy compared to the symbol. ''' if self.results is None: print('No results to plot yet. Run a strategy.') title = '%s | SMA1=%d, SMA2=%d' % (self.symbol, self.SMA1, self.SMA2) self.results[['creturns', 'cstrategy']].plot(title=title, figsize=(10, 6)) def update_and_run(self, SMA): ''' Updates SMA parameters and returns negative absolute performance (for minimazation algorithm). Parameters ========== SMA: tuple SMA parameter tuple ''' self.set_parameters(int(SMA[0]), int(SMA[1])) return -self.run_strategy()[0] def optimize_parameters(self, SMA1_range, SMA2_range): ''' Finds global maximum given the SMA parameter ranges. Parameters ========== SMA1_range, SMA2_range: tuple tuples of the form (start, end, step size) ''' opt = brute(self.update_and_run, (SMA1_range, SMA2_range), finish=None) return opt, -self.update_and_run(opt)if __name__ == '__main__': smabt = SMAVectorBacktester('EUR=', 42, 252, '2010-1-1', '2020-12-31') print(smabt.run_strategy()) smabt.set_parameters(SMA1=20, SMA2=100) print(smabt.run_strategy()) print(smabt.optimize_parameters((30, 56, 4), (200, 300, 4)))

动量回测类

以下是具有基于时间序列动量的策略的向量化回测的类的 Python 代码：

## Python Module with Class# for Vectorized Backtesting# of Momentum-Based Strategies## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdclass MomVectorBacktester(object): ''' Class for the vectorized backtesting of momentum-based trading strategies. Attributes ========== symbol: str RIC (financial instrument) to work with start: str start date for data selection end: str end date for data selection amount: int, float amount to be invested at the beginning tc: float proportional transaction costs (e.g., 0.5% = 0.005) per trade Methods ======= get_data: retrieves and prepares the base data set run_strategy: runs the backtest for the momentum-based strategy plot_results: plots the performance of the strategy compared to the symbol ''' def __init__(self, symbol, start, end, amount, tc): self.symbol = symbol self.start = start self.end = end self.amount = amount self.tc = tc self.results = None self.get_data() def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['return'] = np.log(raw / raw.shift(1)) self.data = raw def run_strategy(self, momentum=1): ''' Backtests the trading strategy. ''' self.momentum = momentum data = self.data.copy().dropna() data['position'] = np.sign(data['return'].rolling(momentum).mean()) data['strategy'] = data['position'].shift(1) * data['return'] # determine when a trade takes place data.dropna(inplace=True) trades = data['position'].diff().fillna(0) != 0 # subtract transaction costs from return when trade takes place data['strategy'][trades] -= self.tc data['creturns'] = self.amount * data['return'].c*msum().apply(np.exp) data['cstrategy'] = self.amount * \ data['strategy'].c*msum().apply(np.exp) self.results = data # absolute performance of the strategy aperf = self.results['cstrategy'].iloc[-1] # out-/underperformance of strategy operf = aperf - self.results['creturns'].iloc[-1] return round(aperf, 2), round(operf, 2) def plot_results(self): ''' Plots the cumulative performance of the trading strategy compared to the symbol. ''' if self.results is None: print('No results to plot yet. Run a strategy.') title = '%s | TC = %.4f' % (self.symbol, self.tc) self.results[['creturns', 'cstrategy']].plot(title=title, figsize=(10, 6))if __name__ == '__main__': mombt = MomVectorBacktester('XAU=', '2010-1-1', '2020-12-31', 10000, 0.0) print(mombt.run_strategy()) print(mombt.run_strategy(momentum=2)) mombt = MomVectorBacktester('XAU=', '2010-1-1', '2020-12-31', 10000, 0.001) print(mombt.run_strategy(momentum=2))

均值回归回测类

以下是具有基于均值回归策略的向量化回测的类的 Python 代码：

## Python Module with Class# for Vectorized Backtesting# of Mean-Reversion Strategies## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#from MomVectorBacktester import *class MRVectorBacktester(MomVectorBacktester): ''' Class for the vectorized backtesting of mean reversion-based trading strategies. Attributes ========== symbol: str RIC symbol with which to work start: str start date for data retrieval end: str end date for data retrieval amount: int, float amount to be invested at the beginning tc: float proportional transaction costs (e.g., 0.5% = 0.005) per trade Methods ======= get_data: retrieves and prepares the base data set run_strategy: runs the backtest for the mean reversion-based strategy plot_results: plots the performance of the strategy compared to the symbol ''' def run_strategy(self, SMA, threshold): ''' Backtests the trading strategy. ''' data = self.data.copy().dropna() data['sma'] = data['price'].rolling(SMA).mean() data['distance'] = data['price'] - data['sma'] data.dropna(inplace=True) # sell signals data['position'] = np.where(data['distance'] > threshold, -1, np.nan) # buy signals data['position'] = np.where(data['distance'] < -threshold, 1, data['position']) # crossing of current price and SMA (zero distance) data['position'] = np.where(data['distance'] * data['distance'].shift(1) < 0, 0, data['position']) data['position'] = data['position'].ffill().fillna(0) data['strategy'] = data['position'].shift(1) * data['return'] # determine when a trade takes place trades = data['position'].diff().fillna(0) != 0 # subtract transaction costs from return when trade takes place data['strategy'][trades] -= self.tc data['creturns'] = self.amount * \ data['return'].c*msum().apply(np.exp) data['cstrategy'] = self.amount * \ data['strategy'].c*msum().apply(np.exp) self.results = data # absolute performance of the strategy aperf = self.results['cstrategy'].iloc[-1] # out-/underperformance of strategy operf = aperf - self.results['creturns'].iloc[-1] return round(aperf, 2), round(operf, 2)if __name__ == '__main__': mrbt = MRVectorBacktester('GDX', '2010-1-1', '2020-12-31', 10000, 0.0) print(mrbt.run_strategy(SMA=25, threshold=5)) mrbt = MRVectorBacktester('GDX', '2010-1-1', '2020-12-31', 10000, 0.001) print(mrbt.run_strategy(SMA=25, threshold=5)) mrbt = MRVectorBacktester('GLD', '2010-1-1', '2020-12-31', 10000, 0.001) print(mrbt.run_strategy(SMA=42, threshold=7.5))

¹ 来源：“过去是否预示着未来？”《经济学人》, 2009 年 9 月 23 日。

² 有关 datetime 和 timedelta 对象的更多信息，请参阅 Hilpisch (2018) 的附录 C。

天网开始以几何倍率学习。它在东部时间凌晨 2:14 自我意识到，时间是 8 月 29 日。
终结者（终结者 2）

近年来，在机器学习、深度学习和人工智能领域取得了巨大进展。总体而言，金融行业和全球算法交易员也试图从这些技术进步中受益。

本章介绍了来自统计学（如线性回归）和机器学习（如逻辑回归）的技术，以基于过去收益预测未来价格走势。它还阐明了使用神经网络预测股市走势的方法。当然，本章不能替代对机器学习的深入介绍，但从实践者的角度，它可以展示如何具体应用某些技术来解决价格预测问题。有关更多详情，请参阅希尔皮什（2020 年）。¹

本章涵盖了以下类型的交易策略：

基于线性回归的策略

这些策略利用线性回归来推测趋势或推导金融工具未来价格运动的方向。

基于机器学习的策略

在算法交易中，通常只需预测金融工具的运动方向，而不是其绝对幅度。基于这种推理，预测问题基本上可以归结为一个分类问题，即决定是否会出现向上或向下的运动。已开发出不同的机器学习算法来解决这类分类问题。本章介绍了逻辑回归作为典型的基准算法，用于分类。

基于深度学习的策略

深度学习已经被 Facebook 等技术巨头普及。类似于机器学习算法，基于神经网络的深度学习算法允许攻击金融市场预测中面临的分类问题。

本章的组织如下。“使用线性回归预测市场运动”介绍了线性回归作为预测指数水平和价格走势方向的技术。“使用机器学习预测市场运动”聚焦于机器学习，并基于线性回归介绍了scikit-learn。它主要涵盖逻辑回归作为明确适用于分类问题的替代线性模型。“使用深度学习预测市场运动”介绍了Keras，以基于神经网络算法预测股市运动的方向。

本章的主要目标是提供基于过去收益预测未来价格走势的实用方法。基本假设是，有效市场假说并非普遍成立，类似于技术分析股票价格图表的推理，历史可能提供一些关于未来的见解，可以通过统计技术进行挖掘。换句话说，假设金融市场中的某些模式会重复出现，以至于可以利用过去的观察结果来预测未来的价格变动。更多细节请参阅 Hilpisch（2020）。

普通最小二乘法（OLS）和线性回归是几十年来证明在许多不同应用领域有用的统计技术。本节使用线性回归进行价格预测。然而，它从基础知识的快速回顾和基本方法的介绍开始。

线性回归的快速回顾

在应用线性回归之前，根据一些随机化数据对方法进行快速回顾可能会有所帮助。示例代码使用NumPy首先生成一个包含独立变量x数据的ndarray对象。基于这些数据，生成了依赖变量y的随机化数据（“噪声数据”）。NumPy提供了两个函数，polyfit和polyval，用于方便地实现基于简单单项式的 OLS 回归。对于线性回归，设置要使用的单项式的最高次数为1。图5-1 显示了数据和回归线：

In [1]: import os import random import numpy as np ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) from pylab import mpl, plt ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) plt.style.use('seaborn') mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'serif' os.environ['PYTHONHASHSEED'] = '0'In [2]: x = np.linspace(0, 10) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [3]: def set_seeds(seed=100): random.seed(seed) np.random.seed(seed) set_seeds() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [4]: y = x + np.random.standard_normal(len(x)) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [5]: reg = np.polyfit(x, y, deg=1) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [6]: reg ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[6]: array([0.94612934, 0.22855261])In [7]: plt.figure(figsize=(10, 6)) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) plt.plot(x, y, 'bo', label='data') ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) plt.plot(x, np.polyval(reg, x), 'r', lw=2.5, label='linear regression') ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) plt.legend(loc=0); ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png)

导入了NumPy。

导入了matplotlib。

生成了一个在 0 到 10 之间的x值的均匀间隔浮点数网格。

为所有相关随机数生成器设置了种子值。

生成了随机化的y值数据。

进行了一次一次最小二乘法（即线性回归）的回归。

显示了最优参数值。

创建了一个新的图形对象。

将原始数据集绘制为点。

绘制了回归线。

创建了图例。

图 5-1. 基于随机数据的线性回归示例

因变量 x 的区间为 $x \in [0, 10]$ 。将区间扩大至，例如， $x \in [0, 20]$ 允许通过外推给出因变量 y 的值，超出原始数据集的域。图5-2 可视化了外推：

In [8]: plt.figure(figsize=(10, 6)) plt.plot(x, y, 'bo', label='data') xn = np.linspace(0, 20) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) plt.plot(xn, np.polyval(reg, xn), 'r', lw=2.5, label='linear regression') plt.legend(loc=0);

生成了 x 值的扩展域。

图 5-2. 基于线性回归的预测（外推）

价格预测的基本理念

基于时间序列数据的价格预测必须处理一个特殊特征：数据的基于时间的排序。一般来说，数据的排序对于线性回归的应用并不重要。在前一节的第一个示例中，实施线性回归的数据可以按完全不同的顺序编制，同时保持 x 和 y 对不变。独立于排序，最优回归参数将保持不变。

然而，在预测明天的指数水平的背景下，例如，正确排序历史上的指数水平似乎至关重要。如果是这样，那么就会试图根据今天、昨天、前天等的指数水平来预测明天的指数水平。作为输入使用的天数通常称为 滞后期。因此，使用今天的指数水平及之前两天的指数水平可被称为 三个滞后期。

下一个示例再次将这个想法简化到一个相当简单的背景中。该示例使用的数据是从 0 到 11 的数字：

In [9]: x = np.arange(12)In [10]: xOut[10]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

假设回归使用三个滞后期。这意味着回归有三个自变量和一个因变量。更具体地说，0、1 和 2 是自变量的值，而 3 是相应的因变量的值。向前一步（“在时间上”），值为 1、2 和 3，以及 4。最终的值组合是 8、9 和 10，与 11。因此，问题是将这个想法正式地表述为形如 $A \cdot x = b$ 的线性方程，其中 $A$ 是矩阵，而 $x$ 和 $b$ 是向量：

In [11]: lags = 3 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [12]: m = np.zeros((lags + 1, len(x) - lags)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [13]: m[lags] = x[lags:] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) for i in range(lags): ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) m[i] = x[i:i - lags] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [14]: m.T ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[14]: array([[ 0., 1., 2., 3.], [ 1., 2., 3., 4.], [ 2., 3., 4., 5.], [ 3., 4., 5., 6.], [ 4., 5., 6., 7.], [ 5., 6., 7., 8.], [ 6., 7., 8., 9.], [ 7., 8., 9., 10.], [ 8., 9., 10., 11.]])

定义了滞后期的数量。

实例化具有适当维度的ndarray对象。

定义目标值（因变量）。

迭代从0到lags - 1的数字。

定义基础向量（独立变量）

显示ndarray对象m的转置。

在转置后的ndarray对象m中，前三列包含三个独立变量的值。它们一起形成矩阵 $A$ 。第四列代表向量 $b$ 。因此，线性回归然后产生缺失的向量 $x$ 。由于现在有更多的独立变量，polyfit和polyval不再适用。但是，在NumPy子包中有一个用于线性代数（linalg）的函数，允许解决一般的最小二乘问题：lstsq。只需要结果数组的第一个元素，因为它包含最佳回归参数：

In [15]: reg = np.linalg.lstsq(m[:lags].T, m[lags], rcond=None)[0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [16]: reg ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[16]: array([-0.66666667, 0.33333333, 1.33333333])In [17]: np.dot(m[:lags].T, reg) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[17]: array([ 3., 4., 5., 6., 7., 8., 9., 10., 11.])

实现线性 OLS 回归。

打印出最优参数。

点积产生预测结果。

这个基本思想很容易应用到现实世界的金融时间序列数据中。

预测指数水平

下一步是将基本方法转化为实际金融工具的时间序列数据，比如欧元/美元汇率：

In [18]: import pandas as pd ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [19]: raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [20]: raw.info() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AAPL.O 2516 non-null float64 1 MSFT.O 2516 non-null float64 2 INTC.O 2516 non-null float64 3 AMZN.O 2516 non-null float64 4 GS.N 2516 non-null float64 5 SPY 2516 non-null float64 6 .SPX 2516 non-null float64 7 .VIX 2516 non-null float64 8 EUR= 2516 non-null float64 9 XAU= 2516 non-null float64 10 GDX 2516 non-null float64 11 GLD 2516 non-null float64 dtypes: float64(12) memory usage: 255.5 KBIn [21]: symbol = 'EUR='In [22]: data = pd.DataFrame(raw[symbol]) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [23]: data.rename(columns={symbol: 'price'}, inplace=True) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

导入pandas包。

检索每日末尾（EOD）数据并将其存储在DataFrame对象中。

从原始DataFrame中选择指定符号的时间序列数据。

将单列重命名为price。

形式上，前面简单示例的 Python 代码几乎不需要更改就可以实现基于回归的预测方法。只需要替换数据对象即可：

In [24]: lags = 5In [25]: cols = [] for lag in range(1, lags + 1): col = f'lag_{lag}' data[col] = data['price'].shift(lag) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) cols.append(col) data.dropna(inplace=True)In [26]: reg = np.linalg.lstsq(data[cols], data['price'], rcond=None)[0]In [27]: regOut[27]: array([ 0.98635864, 0.02292172, -0.04769849, 0.05037365, -0.01208135])

获取price列并将其向后移动lag。

最佳回归参数展示了通常所称的随机行走假设。该假设指出，例如股票价格或汇率遵循随机行走，因此明天价格的最佳预测因子是今天的价格。最佳参数似乎支持这样的假设，因为今天的价格几乎完全解释了明天预测价格水平。其他四个值几乎没有分配任何权重。

图 5-3 展示了 EUR/USD 汇率和预测值。由于多年时间窗口中的数据量庞大，这两个时间序列在图中几乎无法区分：

In [28]: data['prediction'] = np.dot(data[cols], reg) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [29]: data[['price', 'prediction']].plot(figsize=(10, 6)); ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)

计算预测值为dot乘积。

绘制price和prediction列。

图 5-3. EUR/USD 汇率和基于线性回归的预测值（五个滞后）

缩小时间窗口后的绘图可以更好地区分两个时间序列。图 5-4 展示了三个月时间窗口的结果。该图表明，明天汇率的预测基本上是今天汇率的延迟一个交易日的结果：

In [30]: data[['price', 'prediction']].loc['2019-10-1':].plot( figsize=(10, 6));

将线性 OLS 回归应用于基于历史汇率的 EUR/USD 汇率预测，为随机行走假设提供支持。数值示例的结果显示，以最小二乘意义上今天的汇率是明天汇率的最佳预测因子。

图 5-4. EUR/USD 汇率和基于线性回归的预测值（五个滞后，仅三个月）

预测未来回报

到目前为止，分析基于绝对汇率水平。然而，（对数）回报可能对此类统计应用更为合适，因为例如它们使时间序列数据平稳化的特性。应用线性回归到回报数据的代码几乎与之前相同。这一次不仅仅是今天的回报对预测明天的回报重要，而且回归结果在性质上完全不同：

In [31]: data['return'] = np.log(data['price'] / data['price'].shift(1)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [32]: data.dropna(inplace=True) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [33]: cols = [] for lag in range(1, lags + 1): col = f'lag_{lag}' data[col] = data['return'].shift(lag) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) cols.append(col) data.dropna(inplace=True)In [34]: reg = np.linalg.lstsq(data[cols], data['return'], rcond=None)[0]In [35]: regOut[35]: array([-0.015689 , 0.00890227, -0.03634858, 0.01290924, -0.00636023])

计算对数回报。

删除所有具有NaN值的行。

以returns列作为滞后数据。

图 5-5 展示了回报数据和预测值。正如图中生动展示的那样，线性回归显然不能对未来回报的幅度进行有效预测：

In [36]: data['prediction'] = np.dot(data[cols], reg)In [37]: data[['return', 'prediction']].iloc[lags:].plot(figsize=(10, 6));

图 5-5. EUR/USD 对数收益率和基于线性回归（五个滞后期）的预测值。

从交易角度来看，人们可能会认为预测回报的大小并不重要，而是预测方向是否正确。为此，简单的计算提供了一个概述。每当线性回归正确预测方向时，即预测回报的符号正确时，市场回报和预测回报的乘积为正，否则为负。

在这个例子中，预测正确的次数是 1,250 次，错误的次数是 1,242 次，这对应命中率约为 49.9%，或几乎恰好 50%：

In [38]: hits = np.sign(data['return'] * data['prediction']).value_counts() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [39]: hits ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[39]: 1.0 1250 -1.0 1242 0.0 13 dtype: int64In [40]: hits.values[0] / sum(hits) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[40]: 0.499001996007984

计算市场和预测回报的乘积，取结果的符号并计数值。

打印出两种可能值的计数。

计算命中率，定义为所有预测中正确预测的数量。

预测未来市场走向

引发的问题是是否可以通过直接实施基于待预测回报的符号的线性回归来提高命中率。至少在理论上，这将问题简化为预测绝对回报值的符号。在 Python 代码中实现此推理的唯一更改是在回归步骤中使用符号值（即 Python 中的1.0或-1.0）。这确实将命中次数增加到 1,301 次，命中率约为 51.9%——提高了两个百分点：

In [41]: reg = np.linalg.lstsq(data[cols], np.sign(data['return']), rcond=None)[0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [42]: regOut[42]: array([-5.11938725, -2.24077248, -5.13080606, -3.03753232, -2.14819119])In [43]: data['prediction'] = np.sign(np.dot(data[cols], reg)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [44]: data['prediction'].value_counts()Out[44]: 1.0 1300 -1.0 1205 Name: prediction, dtype: int64In [45]: hits = np.sign(data['return'] * data['prediction']).value_counts()In [46]: hitsOut[46]: 1.0 1301 -1.0 1191 0.0 13 dtype: int64In [47]: hits.values[0] / sum(hits)Out[47]: 0.5193612774451097

这直接使用了待预测回报的符号进行回归。

预测步骤中，只有符号是相关的。

基于回归的向量化回测策略

单凭命中率无法充分说明使用线性回归进行交易策略的经济潜力。众所周知，对于给定时间段内市场上的十个最佳和最差日子显著影响投资的总体表现。² 在理想情况下，多头和空头交易者当然会尝试通过适当的市场时机指标在最佳和最差的日子中受益。转化为当前背景，这意味着除了命中率外，市场时机的质量也很重要。因此，沿着第四章中矢量化回测的方法进行回溯测试可以更好地描绘回归预测价值的全貌。

鉴于已有的数据，矢量化回测归结为两行 Python 代码，包括可视化。这是因为预测值已经反映了市场位置（多头或空头）。图5-6 显示，在样本内，基于当前假设的策略明显优于市场（忽略交易成本等因素）：

In [48]: data.head()Out[48]: price lag_1 lag_2 lag_3 lag_4 lag_5 \ Date 2010-01-20 1.4101 -0.005858 -0.008309 -0.000551 0.001103 -0.001310 2010-01-21 1.4090 -0.013874 -0.005858 -0.008309 -0.000551 0.001103 2010-01-22 1.4137 -0.000780 -0.013874 -0.005858 -0.008309 -0.000551 2010-01-25 1.4150 0.003330 -0.000780 -0.013874 -0.005858 -0.008309 2010-01-26 1.4073 0.000919 0.003330 -0.000780 -0.013874 -0.005858 prediction return Date 2010-01-20 1.0 -0.013874 2010-01-21 1.0 -0.000780 2010-01-22 1.0 0.003330 2010-01-25 1.0 0.000919 2010-01-26 1.0 -0.005457In [49]: data['strategy'] = data['prediction'] * data['return'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [50]: data[['return', 'strategy']].sum().apply(np.exp) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[50]: return 0.784026 strategy 1.654154 dtype: float64In [51]: data[['return', 'strategy']].dropna().c*msum( ).apply(np.exp).plot(figsize=(10, 6)); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

将预测值（定位）乘以市场回报。

计算基础工具和策略的总体表现。

绘制基础工具和策略随时间的总体表现（样本内，无交易成本）。

图 5-6. EUR/USD 和基于回归的策略（五个滞后期）的总体表现

基于预测的策略的命中率只是整体策略表现的一面。另一面是策略在市场时机选择上的表现。一种策略在一段时间内正确预测最佳和最差的日子可能会超越市场，即使命中率低于 50%。另一方面，如果一种策略的命中率远高于 50%，但在罕见的大幅波动时预测错误，仍可能表现不及基础工具。

泛化方法

“线性回归回测类” 提供了一个 Python 模块，其中包含一个类，用于基于向量化的回归交易策略的回测，类似于第四章中的内容。除了允许任意投资金额和比例交易成本外，它还允许样本内拟合线性回归模型和样本外评估。这意味着回归模型是基于数据集的一部分进行拟合的，比如说 2010 年到 2015 年的数据，而基于另一部分数据集进行评估，比如说 2016 年到 2019 年的数据。对于涉及优化或拟合步骤的所有策略，这提供了更真实的实际表现视角，因为它有助于避免数据窥探和模型过拟合带来的问题（也见“数据窥探和过拟合”）。

图 5-7 显示，基于五个滞后项的回归策略在特定配置下在样本外表现优于 EUR/USD 基础工具，并且在考虑交易成本之前也是如此：

In [52]: import LRVectorBacktester as LR ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [53]: lrbt = LR.LRVectorBacktester('EUR=', '2010-1-1', '2019-12-31', 10000, 0.0) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [54]: lrbt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31', lags=5) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[54]: (17166.53, 9442.42)In [55]: lrbt.run_strategy('2010-1-1', '2017-12-31', '2018-1-1', '2019-12-31', lags=5) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[55]: (10160.86, 791.87)In [56]: lrbt.plot_results() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)

将模块导入为 LR。

实例化 LRVectorBacktester 类的对象。

在相同数据集上训练和评估策略。

在训练和评估步骤中使用了两个不同的数据集。

绘制样本外策略表现与市场的比较。

图 5-7. EUR/USD 和基于回归的策略（五个滞后项，在交易成本前的样本外）的总体表现

考虑 GDX ETF。所选择的策略配置显示在样本外并在考虑交易成本后表现出色（见图 5-8）：

In [57]: lrbt = LR.LRVectorBacktester('GDX', '2010-1-1', '2019-12-31', 10000, 0.002) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [58]: lrbt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31', lags=7)Out[58]: (23642.32, 17649.69)In [59]: lrbt.run_strategy('2010-1-1', '2014-12-31', '2015-1-1', '2019-12-31', lags=7)Out[59]: (28513.35, 14888.41)In [60]: lrbt.plot_results()

改变了GDX的时间序列数据。

图 5-8. `GDX` ETF 和基于回归的策略（七个滞后项，在交易成本后的样本外）的总体表现

如今，Python 生态系统在机器学习领域提供了许多包。其中最受欢迎的是 scikit-learn（参见 scikit-learn 主页），它也是文档和维护最好的包之一。本节首先介绍了基于线性回归的包的 API，复制了前一节部分结果。然后，使用逻辑回归作为分类算法来解决预测未来市场方向的问题。

使用 `scikit-learn` 进行线性回归

为了介绍 scikit-learn 的 API，重新审视本章节中介绍的预测方法的基本思想是有益的。数据准备与 NumPy 相同：

In [61]: x = np.arange(12)In [62]: xOut[62]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])In [63]: lags = 3In [64]: m = np.zeros((lags + 1, len(x) - lags))In [65]: m[lags] = x[lags:] for i in range(lags): m[i] = x[i:i - lags]

对于我们的目的，使用 scikit-learn 主要包括三个步骤：

模型选择：需选择并实例化一个模型。
模型拟合：模型将被拟合到手头的数据。
预测：给定拟合模型，进行预测。

要应用线性回归，可以使用 linear_model 子包进行广义线性模型（参见 scikit-learn 线性模型页面）。默认情况下，LinearRegression 模型拟合一个截距值：

In [66]: from sklearn import linear_model ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [67]: lm = linear_model.LinearRegression() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [68]: lm.fit(m[:lags].T, m[lags]) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[68]: LinearRegression()In [69]: lm.coef_ ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[69]: array([0.33333333, 0.33333333, 0.33333333])In [70]: lm.intercept_ ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[70]: 2.0In [71]: lm.predict(m[:lags].T) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[71]: array([ 3., 4., 5., 6., 7., 8., 9., 10., 11.])

导入广义线性模型类。

实例化线性回归模型。

将模型拟合到数据中。

打印出最佳的回归参数。

打印出截距值。

给定拟合模型，预测所需数值。

将参数 fit_intercept 设置为 False 将给出与 NumPy 和 polyfit() 相同的回归结果：

In [72]: lm = linear_model.LinearRegression(fit_intercept=False) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [73]: lm.fit(m[:lags].T, m[lags])Out[73]: LinearRegression(fit_intercept=False)In [74]: lm.coef_Out[74]: array([-0.66666667, 0.33333333, 1.33333333])In [75]: lm.intercept_Out[75]: 0.0In [76]: lm.predict(m[:lags].T)Out[76]: array([ 3., 4., 5., 6., 7., 8., 9., 10., 11.])

强制拟合而不使用截距值。

此示例已很好地展示了如何将 scikit-learn 应用于预测问题。由于其一致的 API 设计，基本方法可适用于其他模型。

一个简单的分类问题

在分类问题中，必须确定新观察属于有限类别（“类别”）中的哪一个。机器学习中研究的一个经典问题是识别手写数字 0 到 9。这样的识别会导致一个正确的结果，比如 3。或者导致一个错误的结果，比如 6 或 8，所有这些错误结果同样错误。在金融市场的背景下，预测金融工具价格可能会导致远离正确值或接近正确值的数值结果。预测明天市场方向，只能有正确或（“完全”）错误的结果。后者是一个分类问题，类别集合限定为“上升”和“下降”或“+1”和“–1”或“1”和“0”。相比之下，前者是一个估计问题。

一个简单的分类问题示例在维基百科的逻辑回归中找到。数据集将学生准备考试的学习时间与每个学生是否通过考试联系起来。虽然学习时间是一个实数（float对象），通过考试是True或False（即数字1或0）。图 5-9 展示了数据的图形化表现：

In [77]: hours = np.array([0.5, 0.75, 1., 1.25, 1.5, 1.75, 1.75, 2., 2.25, 2.5, 2.75, 3., 3.25, 3.5, 4., 4.25, 4.5, 4.75, 5., 5.5]) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [78]: success = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1]) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [79]: plt.figure(figsize=(10, 6)) plt.plot(hours, success, 'ro') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) plt.ylim(-0.2, 1.2); ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

不同学生的学习时间（顺序很重要）。

每个学生通过考试的成功（顺序很重要）。

将数据集绘制成以hours为x值，以success为y值的图表。

调整 y 轴的限制。

图 5-9. 分类问题示例数据

在这样的背景下通常会提出基本问题：给定学生的学习时间（不在数据集中），他们能否通过考试？线性回归能给出什么样的答案？可能不尽如人意，如图5-10 所示。在不同的学习时间下，线性回归主要给出（预测）值在 0 到 1 之间，同时也有更低或更高的值。但考试的结果只能是失败或成功：

In [80]: reg = np.polyfit(hours, success, deg=1) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [81]: plt.figure(figsize=(10, 6)) plt.plot(hours, success, 'ro') plt.plot(hours, np.polyval(reg, hours), 'b') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) plt.ylim(-0.2, 1.2);

对数据集实施线性回归。

绘制回归线以及数据集。

图 5-10. 应用于分类问题的线性回归

这是分类算法如逻辑回归和支持向量机发挥作用的地方。为了说明，逻辑回归的应用足够了（详见 James 等人（2013 年，第四章）获取更多背景信息）。相应的类也可以在linear_model子包中找到。图5-11 展示了以下 Python 代码的结果。这次，每个不同输入值都有一个明确的（预测）值。模型预测，学习了 0 至 2 小时的学生会失败。对于所有等于或高于 2.75 小时的值，模型预测学生通过考试：

In [82]: lm = linear_model.LogisticRegression(solver='lbfgs') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [83]: hrs = hours.reshape(1, -1).T ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [84]: lm.fit(hrs, success) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[84]: LogisticRegression()In [85]: prediction = lm.predict(hrs) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [86]: plt.figure(figsize=(10, 6)) plt.plot(hours, success, 'ro', label='data') plt.plot(hours, prediction, 'b', label='prediction') plt.legend(loc=0) plt.ylim(-0.2, 1.2);

实例化逻辑回归模型。

将一维ndarray对象重塑为二维对象（scikit-learn所需）。

实现拟合步骤。

给定拟合模型，实现预测步骤。

图 5-11。逻辑回归应用于分类问题

然而，正如图5-11 所示，2.75 小时或更多时间并不保证成功。从那么多小时开始，成功只是“更有可能”而已。这种概率推理也可以根据同一模型实例进行分析和可视化，如下面的代码所示。图5-12 中的虚线显示了成功概率（单调递增）。点划线显示了失败概率（单调递减）：

In [87]: prob = lm.predict_proba(hrs) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [88]: plt.figure(figsize=(10, 6)) plt.plot(hours, success, 'ro') plt.plot(hours, prediction, 'b') plt.plot(hours, prob.T[0], 'm--', label='$p(h)$ for zero') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) plt.plot(hours, prob.T[1], 'g-.', label='$p(h)$ for one') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) plt.ylim(-0.2, 1.2) plt.legend(loc=0);

预测成功和失败的概率，分别。

绘制失败概率。

绘制成功概率。

图 5-12。基于逻辑回归分别得出的成功和失败概率。

scikit-learn非常好地提供了统一访问各种机器学习模型的方式。示例显示，应用逻辑回归的 API 与应用线性回归的 API 并无不同。因此，scikit-learn非常适合在某些应用场景中测试多种适当的机器学习模型，而几乎不需要更改 Python 代码。

掌握了基础知识，下一步是将逻辑回归应用于预测市场方向的问题。

使用逻辑回归预测市场方向

在机器学习中，通常用特征代替回归背景下的独立变量或解释变量。简单的分类示例只有一个特征：学习小时数。实际应用中，通常有多个可用于分类的特征。根据本章介绍的预测方法，可以通过滞后来识别一个特征。因此，从时间序列数据中工作的三个滞后意味着有三个特征。作为可能的结果或类别，只有+1和-1分别表示向上和向下的移动。尽管措辞不同，但形式主义保持不变，特别是在推导矩阵时，现在称为特征矩阵。

下面的代码提供了一种创建pandas DataFrame的“特征矩阵”的替代方法，这三个步骤同样适用，甚至更符合 Pythonic 风格。现在的特征矩阵是原始数据集中列的子集：

In [89]: symbol = 'GLD'In [90]: data = pd.DataFrame(raw[symbol])In [91]: data.rename(columns={symbol: 'price'}, inplace=True)In [92]: data['return'] = np.log(data['price'] / data['price'].shift(1))In [93]: data.dropna(inplace=True)In [94]: lags = 3In [95]: cols = [] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) for lag in range(1, lags + 1): col = 'lag_{}'.format(lag) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) data[col] = data['return'].shift(lag) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) cols.append(col) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [96]: data.dropna(inplace=True) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)

实例化一个空的list对象以收集列名。

为列名创建一个str对象。

向DataFrame对象添加一个带有相应滞后数据的新列。

将列名附加到list对象。

确保数据集完整。

与线性回归相比，逻辑回归将命中率提高了超过一个百分点，达到约 54.5%。图5-13 展示了基于逻辑回归预测的策略表现。尽管命中率更高，但性能比线性回归差：

In [97]: from sklearn.metrics import accuracy_scoreIn [98]: lm = linear_model.LogisticRegression(C=1e7, solver='lbfgs', multi_class='auto', max_iter=1000) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [99]: lm.fit(data[cols], np.sign(data['return'])) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[99]: LogisticRegression(C=10000000.0, max_iter=1000)In [100]: data['prediction'] = lm.predict(data[cols]) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [101]: data['prediction'].value_counts() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[101]: 1.0 1983 -1.0 529 Name: prediction, dtype: int64In [102]: hits = np.sign(data['return'].iloc[lags:] * data['prediction'].iloc[lags:] ).value_counts() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [103]: hitsOut[103]: 1.0 1338 -1.0 1159 0.0 12 dtype: int64In [104]: accuracy_score(data['prediction'], np.sign(data['return'])) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[104]: 0.5338375796178344In [105]: data['strategy'] = data['prediction'] * data['return'] ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [106]: data[['return', 'strategy']].sum().apply(np.exp) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[106]: return 1.289478 strategy 2.458716 dtype: float64In [107]: data[['return', 'strategy']].c*msum().apply(np.exp).plot( figsize=(10, 6)); ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)

使用一个C值实例化模型对象，该值减少了正则化项的权重（参见广义线性模型页面）。

根据要预测的回报的符号拟合模型。

在DataFrame对象中生成一个新列，并将预测值写入其中。

显示生成的多头和空头仓位的数量。

计算正确和错误预测的数量。

在这种情况下，准确率（命中率）为 53.3%。

然而，策略的总体表现…

…与被动基准投资相比要高得多。

图 5-13. `GLD` ETF 的总体表现和基于逻辑回归的策略（3 个滞后值，样本内）

将使用的滞后数从三增加到五，虽然降低了命中率，但在某种程度上提高了策略的总体表现（样本内，在交易成本之前）。图 5-14 显示了相应的表现：

In [108]: data = pd.DataFrame(raw[symbol])In [109]: data.rename(columns={symbol: 'price'}, inplace=True)In [110]: data['return'] = np.log(data['price'] / data['price'].shift(1))In [111]: lags = 5In [112]: cols = [] for lag in range(1, lags + 1): col = 'lag_%d' % lag data[col] = data['price'].shift(lag) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) cols.append(col)In [113]: data.dropna(inplace=True)In [114]: lm.fit(data[cols], np.sign(data['return'])) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[114]: LogisticRegression(C=10000000.0, max_iter=1000)In [115]: data['prediction'] = lm.predict(data[cols])In [116]: data['prediction'].value_counts() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[116]: 1.0 2047 -1.0 464 Name: prediction, dtype: int64In [117]: hits = np.sign(data['return'].iloc[lags:] * data['prediction'].iloc[lags:] ).value_counts()In [118]: hitsOut[118]: 1.0 1331 -1.0 1163 0.0 12 dtype: int64In [119]: accuracy_score(data['prediction'], np.sign(data['return'])) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[119]: 0.5312624452409399In [120]: data['strategy'] = data['prediction'] * data['return'] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [121]: data[['return', 'strategy']].sum().apply(np.exp) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[121]: return 1.283110 strategy 2.656833 dtype: float64In [122]: data[['return', 'strategy']].c*msum().apply(np.exp).plot( figsize=(10, 6));

将滞后数增加到五。

根据五个滞后值拟合模型。

现在有了更多的空头头寸。

准确率（命中率）降低到了 53.1%。

累积表现也显著增加。

图 5-14. `GLD` ETF 的总体表现和基于逻辑回归的策略（五个滞后值，样本内）

这里必须小心，不要陷入过拟合的陷阱。通过一种方法获得了更为现实的图片，该方法使用训练数据（=样本内数据）来拟合模型，并使用测试数据（=样本外数据）来评估策略性能。这在接下来的部分中完成，当该方法再次以 Python 类的形式被概括时。

泛化方法

分类算法回测类 提供了一个 Python 模块，其中包含一个用于基于scikit-learn线性模型的策略的向量化回测的类。尽管只实现了线性和逻辑回归，但模型数量很容易增加。原则上，ScikitVectorBacktester 类可以从 LRVectorBacktester 中继承选定的方法，但它以自包含的方式呈现。这使得更容易增强和重用该类用于实际应用。

基于ScikitBacktesterClass，可以对基于逻辑回归的策略进行样本外评估。本例使用 EUR/USD 汇率作为基础工具。

图 5-15 说明了该策略在样本外期间（跨越 2019 年）表现优于基础工具，然而，与以前一样，不考虑交易成本：

In [123]: import ScikitVectorBacktester as SCIIn [124]: scibt = SCI.ScikitVectorBacktester('EUR=', '2010-1-1', '2019-12-31', 10000, 0.0, 'logistic')In [125]: scibt.run_strategy('2015-1-1', '2019-12-31', '2015-1-1', '2019-12-31', lags=15)Out[125]: (12192.18, 2189.5)In [126]: scibt.run_strategy('2016-1-1', '2018-12-31', '2019-1-1', '2019-12-31', lags=15)Out[126]: (10580.54, 729.93)In [127]: scibt.plot_results()

图5-15. 标准普尔 500 指数的总体表现及样本外基于逻辑回归的策略（15 滞后，无交易成本）

举个例子，考虑相同策略应用于 GDX ETF，在 2018 年展示了样本外的超额表现（在考虑交易成本之前）在图5-16 中：

In [128]: scibt = SCI.ScikitVectorBacktester('GDX', '2010-1-1', '2019-12-31', 10000, 0.00, 'logistic')In [129]: scibt.run_strategy('2013-1-1', '2017-12-31', '2018-1-1', '2018-12-31', lags=10)Out[129]: (12686.81, 4032.73)In [130]: scibt.plot_results()

图5-16. `GDX` ETF 的总体表现及基于逻辑回归的策略（10 滞后，样本外，无交易成本）

图5-17 展示了考虑交易成本后，总体表现如何下降，甚至导致净损失，而其他所有参数保持不变：

In [131]: scibt = SCI.ScikitVectorBacktester('GDX', '2010-1-1', '2019-12-31', 10000, 0.0025, 'logistic')In [132]: scibt.run_strategy('2013-1-1', '2017-12-31', '2018-1-1', '2018-12-31', lags=10)Out[132]: (9588.48, 934.4)In [133]: scibt.plot_results()

图5-17. `GDX` ETF 的总体表现及基于逻辑回归的策略（10 滞后，样本外，含交易成本）

将复杂的机器学习技术应用于股市预测通常在早期表现出有希望的结果。在多个示例中，回测策略往往在样本内表现优于基础工具。这种突出的表现往往是由于简化假设的混合，同时也由于预测模型的过拟合。例如，测试相同策略而非样本内数据集，加入交易成本——这两种方法都能更真实地描绘出的图片——通常表明，所考虑策略的表现“突然”在性能上落后于基础工具或转为净损失。

自从谷歌开源并发布后，深度学习库 TensorFlow 吸引了广泛关注和应用。本节将 TensorFlow 应用于与前一节类似的股市动向预测，模型化为分类问题。然而，TensorFlow 并非直接使用；它是通过同样流行的 Keras 深度学习包间接使用的。Keras 可以被视为提供了对 TensorFlow 包更高级别的抽象，具有更易于理解和使用的 API。

库最好通过 pip install tensorflow 和 pip install keras 安装。scikit-learn 还提供了应用神经网络解决分类问题的类。

想要了解更多有关深度学习和 Keras 的背景信息，请参阅 Goodfellow 等人（2016）和 Chollet（2017）。

简单的分类问题再审视

为了说明将神经网络应用于分类问题的基本方法，前一节介绍的简单分类问题再次显示出其用处：

In [134]: hours = np.array([0.5, 0.75, 1., 1.25, 1.5, 1.75, 1.75, 2., 2.25, 2.5, 2.75, 3., 3.25, 3.5, 4., 4.25, 4.5, 4.75, 5., 5.5])In [135]: success = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])In [136]: data = pd.DataFrame({'hours': hours, 'success': success}) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [137]: data.info() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) <class 'pandas.core.frame.DataFrame'> RangeIndex: 20 entries, 0 to 19 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 hours 20 non-null float64 1 success 20 non-null int64 dtypes: float64(1), int64(1) memory usage: 448.0 bytes

将两个数据子集存储在 DataFrame 对象中。

打印出DataFrame对象的元信息。

准备工作完成后，可以导入并直接应用scikit-learn中的MLPClassifier。³ 在这个上下文中，“MLP”代表多层感知器，这是密集神经网络的另一种表达方式。与以往一样，使用scikit-learn应用神经网络的 API 基本相同：

In [138]: from sklearn.neural_network import MLPClassifier ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [139]: model = MLPClassifier(hidden_layer_sizes=[32], max_iter=1000, random_state=100) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)

从scikit-learn导入MLPClassifier对象。

实例化MLPClassifier对象。

以下代码拟合模型，生成预测并绘制结果，如图 5-18 所示：

In [140]: model.fit(data['hours'].values.reshape(-1, 1), data['success']) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[140]: MLPClassifier(hidden_layer_sizes=[32], max_iter=1000, random_state=100)In [141]: data['prediction'] = model.predict(data['hours'].values.reshape(-1, 1)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [142]: data.tail()Out[142]: hours success prediction 15 4.25 1 1 16 4.50 1 1 17 4.75 1 1 18 5.00 1 1 19 5.50 1 1In [143]: data.plot(x='hours', y=['success', 'prediction'], style=['ro', 'b-'], ylim=[-.1, 1.1], figsize=(10, 6)); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

为分类拟合神经网络。

根据拟合模型生成预测值。

绘制原始数据和预测值。

这个简单的例子显示了深度学习方法的应用与scikit-learn和LogisticRegression模型对象的方法非常相似。API 基本相同，只是参数不同。

图 5-18. 简单分类示例中使用`MLPClassifier`的基础数据和预测结果

使用深度神经网络预测市场方向。

下一步是将这种方法应用于股票市场数据，以金融时间序列的对数收益率形式。首先，需要检索和准备数据：

In [144]: symbol = 'EUR=' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [145]: data = pd.DataFrame(raw[symbol]) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [146]: data.rename(columns={symbol: 'price'}, inplace=True) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [147]: data['return'] = np.log(data['price'] / data['price'].shift(1)) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [148]: data['direction'] = np.where(data['return'] > 0, 1, 0) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [149]: lags = 5In [150]: cols = [] for lag in range(1, lags + 1): ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) col = f'lag_{lag}' data[col] = data['return'].shift(lag) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) cols.append(col) data.dropna(inplace=True) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [151]: data.round(4).tail() ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)Out[151]: price return direction lag_1 lag_2 lag_3 lag_4 lag_5 Date 2019-12-24 1.1087 0.0001 1 0.0007 -0.0038 0.0008 -0.0034 0.0006 2019-12-26 1.1096 0.0008 1 0.0001 0.0007 -0.0038 0.0008 -0.0034 2019-12-27 1.1175 0.0071 1 0.0008 0.0001 0.0007 -0.0038 0.0008 2019-12-30 1.1197 0.0020 1 0.0071 0.0008 0.0001 0.0007 -0.0038 2019-12-31 1.1210 0.0012 1 0.0020 0.0071 0.0008 0.0001 0.0007

从CSV文件读取数据。

选择感兴趣的单个时间序列列。

将唯一的列重命名为price。

计算对数收益率并将direction定义为二进制列。

创建滞后数据。

创建新的DataFrame列，其中包含相应滞后数的对数收益率。

删除包含NaN值的行。

输出最后五行，指示五个特征列中出现的“模式”。

以下代码使用密集神经网络（DNN）与Keras包进行训练和测试数据子集的定义，定义特征列和标签，并拟合分类器。在后台，Keras使用TensorFlow包来完成任务。图5-19 显示了 DNN 分类器在训练过程中训练和验证数据集的准确性如何变化。作为验证数据集，使用训练数据的 20%（不进行洗牌）：

In [152]: import tensorflow as tf ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) from keras.models import Sequential ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) from keras.layers import Dense ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) from keras.optimizers import Adam, RMSpropIn [153]: optimizer = Adam(learning_rate=0.0001)In [154]: def set_seeds(seed=100): random.seed(seed) np.random.seed(seed) tf.random.set_seed(100)In [155]: set_seeds() model = Sequential() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) model.add(Dense(64, activation='relu', input_shape=(lags,))) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) model.add(Dense(64, activation='relu')) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) model.add(Dense(1, activation='sigmoid')) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy']) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [156]: cutoff = '2017-12-31' ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [157]: training_data = data[data.index < cutoff].copy() ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [158]: mu, std = training_data.mean(), training_data.std() ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [159]: training_data_ = (training_data - mu) / std ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [160]: test_data = data[data.index >= cutoff].copy() ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [161]: test_data_ = (test_data - mu) / std ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [162]: %%time model.fit(training_data[cols], training_data['direction'], epochs=50, verbose=False, validation_split=0.2, shuffle=False) ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) CPU times: user 4.86 s, sys: 989 ms, total: 5.85 s Wall time: 3.34 sOut[162]: <tensorflow.python.keras.callbacks.History at 0x7f996a0a2880>In [163]: res = pd.DataFrame(model.history.history)In [164]: res[['accuracy', 'val_accuracy']].plot(figsize=(10, 6), style='--');

导入TensorFlow包。

从Keras导入所需的模型对象。

从Keras中导入相关层对象。

实例化一个Sequential模型。

定义隐藏层和输出层。

为分类编译Sequential模型对象。

定义训练和测试数据之间的截止日期。

定义训练和测试数据集。

通过高斯归一化对特征数据进行归一化。

将模型拟合到训练数据集。

图 5-19. DNN 分类器在训练和验证数据中的准确性每个训练步骤

配备适配的分类器，该模型可以在训练数据集上生成预测。图5-20 显示了策略总体绩效与基础工具（样本内）的比较：

In [165]: model.evaluate(training_data_[cols], training_data['direction']) 63/63 [==============================] - 0s 586us/step - loss: 0.7556 - accuracy: 0.5152Out[165]: [0.7555528879165649, 0.5151968002319336]In [166]: pred = np.where(model.predict(training_data_[cols]) > 0.5, 1, 0) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [167]: pred[:30].flatten() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[167]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0])In [168]: training_data['prediction'] = np.where(pred > 0, 1, -1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [169]: training_data['strategy'] = (training_data['prediction'] * training_data['return']) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [170]: training_data[['return', 'strategy']].sum().apply(np.exp)Out[170]: return 0.826569 strategy 1.317303 dtype: float64In [171]: training_data[['return', 'strategy']].c*msum( ).apply(np.exp).plot(figsize=(10, 6)); ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

在样本内预测市场方向。

将预测转换为多空头寸，+1和-1。

计算给定仓位的策略回报。

绘制并比较策略绩效与基准绩效（样本内）。

图 5-20. EUR/USD 的总体绩效与基于深度学习的策略比较（样本内，无交易成本）

该策略在训练数据集上似乎比基础工具稍有改进（样本内，无交易成本）。然而，更有趣的问题是它在测试数据集上的表现。在起步踯躅后，策略也在测试数据集上表现优于基础工具，正如图 5-21 所示。尽管分类器在测试数据集上的准确率仅略高于 50%：

In [172]: model.evaluate(test_data_[cols], test_data['direction']) 16/16 [==============================] - 0s 676us/step - loss: 0.7292 - accuracy: 0.5050Out[172]: [0.7292129993438721, 0.5049701929092407]In [173]: pred = np.where(model.predict(test_data_[cols]) > 0.5, 1, 0)In [174]: test_data['prediction'] = np.where(pred > 0, 1, -1)In [175]: test_data['prediction'].value_counts()Out[175]: -1 368 1 135 Name: prediction, dtype: int64In [176]: test_data['strategy'] = (test_data['prediction'] * test_data['return'])In [177]: test_data[['return', 'strategy']].sum().apply(np.exp)Out[177]: return 0.934478 strategy 1.109065 dtype: float64In [178]: test_data[['return', 'strategy']].c*msum( ).apply(np.exp).plot(figsize=(10, 6));

图 5-21. EUR/USD 的总体表现与基于深度学习的策略对比（样本外，无交易成本）

添加不同类型的特征

到目前为止，分析主要集中在对数收益率上。当然，不仅可以添加更多的类别/分类，还可以添加其他类型的特征，比如基于动量、波动性或距离度量的特征。接下来的代码会推导出这些额外的特征并将它们添加到数据集中：

In [179]: data['momentum'] = data['return'].rolling(5).mean().shift(1) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [180]: data['volatility'] = data['return'].rolling(20).std().shift(1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [181]: data['distance'] = (data['price'] - data['price'].rolling(50).mean()).shift(1) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [182]: data.dropna(inplace=True)In [183]: cols.extend(['momentum', 'volatility', 'distance'])In [184]: print(data.round(4).tail()) price return direction lag_1 lag_2 lag_3 lag_4 lag_5 Date 2019-12-24 1.1087 0.0001 1 0.0007 -0.0038 0.0008 -0.0034 0.0006 2019-12-26 1.1096 0.0008 1 0.0001 0.0007 -0.0038 0.0008 -0.0034 2019-12-27 1.1175 0.0071 1 0.0008 0.0001 0.0007 -0.0038 0.0008 2019-12-30 1.1197 0.0020 1 0.0071 0.0008 0.0001 0.0007 -0.0038 2019-12-31 1.1210 0.0012 1 0.0020 0.0071 0.0008 0.0001 0.0007 momentum volatility distance Date 2019-12-24 -0.0010 0.0024 0.0005 2019-12-26 -0.0011 0.0024 0.0004 2019-12-27 -0.0003 0.0024 0.0012 2019-12-30 0.0010 0.0028 0.0089 2019-12-31 0.0021 0.0028 0.0110

基于动量的特征。

基于波动性的特征。

基于距离的特征。

下一步是重新定义训练和测试数据集，规范化特征数据，并更新模型以反映新的特征列：

In [185]: training_data = data[data.index < cutoff].copy()In [186]: mu, std = training_data.mean(), training_data.std()In [187]: training_data_ = (training_data - mu) / stdIn [188]: test_data = data[data.index >= cutoff].copy()In [189]: test_data_ = (test_data - mu) / stdIn [190]: set_seeds() model = Sequential() model.add(Dense(32, activation='relu', input_shape=(len(cols),))) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) model.add(Dense(32, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

参数input_shape已调整以反映新的特征数。

基于丰富的特征集，可以训练分类器。如图 5-22 所示，策略的样本内表现比以前好得多：

In [191]: %%time model.fit(training_data_[cols], training_data['direction'], verbose=False, epochs=25) CPU times: user 2.32 s, sys: 577 ms, total: 2.9 s Wall time: 1.48 sOut[191]: <tensorflow.python.keras.callbacks.History at 0x7f996d35c100>In [192]: model.evaluate(training_data_[cols], training_data['direction']) 62/62 [==============================] - 0s 649us/step - loss: 0.6816 - accuracy: 0.5646Out[192]: [0.6816270351409912, 0.5646397471427917]In [193]: pred = np.where(model.predict(training_data_[cols]) > 0.5, 1, 0)In [194]: training_data['prediction'] = np.where(pred > 0, 1, -1)In [195]: training_data['strategy'] = (training_data['prediction'] * training_data['return'])In [196]: training_data[['return', 'strategy']].sum().apply(np.exp)Out[196]: return 0.901074 strategy 2.703377 dtype: float64In [197]: training_data[['return', 'strategy']].c*msum( ).apply(np.exp).plot(figsize=(10, 6));

图 5-22. EUR/USD 的总体表现与基于深度学习的策略对比（样本内，额外特征）

最后一步是对分类器进行评估，并推导出样本外策略表现。与没有额外特征的情况相比，分类器的表现也显著提高，其他条件不变。与训练数据集相比，起步稍显踯躅（见图 5-23）：

In [198]: model.evaluate(test_data_[cols], test_data['direction']) 16/16 [==============================] - 0s 800us/step - loss: 0.6931 - accuracy: 0.5507Out[198]: [0.6931276321411133, 0.5506958365440369]In [199]: pred = np.where(model.predict(test_data_[cols]) > 0.5, 1, 0)In [200]: test_data['prediction'] = np.where(pred > 0, 1, -1)In [201]: test_data['prediction'].value_counts()Out[201]: -1 335 1 168 Name: prediction, dtype: int64In [202]: test_data['strategy'] = (test_data['prediction'] * test_data['return'])In [203]: test_data[['return', 'strategy']].sum().apply(np.exp)Out[203]: return 0.934478 strategy 1.144385 dtype: float64In [204]: test_data[['return', 'strategy']].c*msum( ).apply(np.exp).plot(figsize=(10, 6));

图 5-23. EUR/USD 的总体表现与基于深度学习的策略对比（样本外，额外特征）

Keras包与其后端TensorFlow包结合使用，允许利用深度学习的最新进展，如深度神经网络（DNN）分类器，用于算法交易。该应用与使用scikit-learn的其他机器学习模型一样简单。本节中展示的方法允许轻松地增强所使用的不同类型的特征。

作为一项练习，编写一个 Python 类（灵感来自“线性回归回测类”和“分类算法回测类”），允许更系统化和现实化地使用Keras包进行金融市场预测和相应交易策略的回测。

预测未来市场走势是金融界的圣杯。这意味着找到真理。这意味着克服有效市场。如果能够在这方面取得相当大的优势，那么卓越的投资和交易回报将是其结果。本章介绍了来自传统统计学、机器学习和深度学习领域的统计技术，以便根据过去的收益或类似的金融数据预测未来市场走向。一些首次样本内结果是有希望的，无论是线性回归还是逻辑回归。然而，在样本外评估这些策略并考虑交易成本时，可以获得更可靠的印象。

本章并不声称找到了圣杯。它更多地提供了一瞥可能在寻找圣杯过程中证明有用的技术。scikit-learn的统一 API 也使得例如用另一个线性模型替换一个线性模型变得很容易。在这种意义上，ScikitBacktesterClass可以作为探索更多机器学习模型并将其应用于金融时间序列预测的起点。

本章开头引用了 1991 年电影《终结者 2》中的一句话，对于计算机能够学习和获得意识的速度和程度持乐观态度。无论您是否相信计算机会在生活的大部分领域取代人类，或者它们是否有一天会真正意识到自我存在，它们已被证明对人类非常有用，几乎在生活的任何领域都作为支持设备存在。而像机器学习、深度学习或人工智能中使用的算法至少有望让它们在不久的将来成为更好的算法交易员。关于这些主题和考虑的更详细介绍可以在 Hilpisch（2020）中找到。

Guido 和 Müller（2016）以及 VanderPlas（2016）的书提供了 Python 和scikit-learn进行机器学习的实用介绍。Hilpisch（2020）的书专注于将机器学习和深度学习算法应用于识别统计效率低下和通过算法交易利用经济效率低下的问题。

Guido，Sarah 和 Andreas Müller。2016。《Python 数据科学入门：数据科学家指南》。Sebastopol：O’Reilly。
Hilpisch，Yves。2020。《金融中的人工智能：基于 Python 的指南》。Sebastopol：O’Reilly。
VanderPlas，Jake。2016。《Python 数据科学手册：处理数据的基本工具》。Sebastopol：O’Reilly。

Hastie 等人（2008 年）和 James 等人（2013 年）的书提供了流行的机器学习技术和算法的彻底数学概述：

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. 统计学习的要素. 第 2 版。纽约：斯普林格出版社。
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. 统计学习导论. 纽约：斯普林格出版社。

欲了解更多关于深度学习和Keras的背景信息，请参阅以下书籍：

Chollet, Francois. 2017. Python 深度学习. Shelter Island：Manning。
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. 深度学习. 剑桥：MIT 出版社。http://deeplearningbook.org。

本节展示了本章引用和使用的 Python 脚本。

线性回归回测类

这里展示了一个基于线性回归的策略向量化回测的 Python 代码，用于预测市场运动方向：

## Python Module with Class# for Vectorized Backtesting# of Linear Regression-Based Strategies## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdclass LRVectorBacktester(object): ''' Class for the vectorized backtesting of linear regression-based trading strategies. Attributes ========== symbol: str TR RIC (financial instrument) to work with start: str start date for data selection end: str end date for data selection amount: int, float amount to be invested at the beginning tc: float proportional transaction costs (e.g., 0.5% = 0.005) per trade Methods ======= get_data: retrieves and prepares the base data set select_data: selects a sub-set of the data prepare_lags: prepares the lagged data for the regression fit_model: implements the regression step run_strategy: runs the backtest for the regression-based strategy plot_results: plots the performance of the strategy compared to the symbol ''' def __init__(self, symbol, start, end, amount, tc): self.symbol = symbol self.start = start self.end = end self.amount = amount self.tc = tc self.results = None self.get_data() def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['returns'] = np.log(raw / raw.shift(1)) self.data = raw.dropna() def select_data(self, start, end): ''' Selects sub-sets of the financial data. ''' data = self.data[(self.data.index >= start) & (self.data.index <= end)].copy() return data def prepare_lags(self, start, end): ''' Prepares the lagged data for the regression and prediction steps. ''' data = self.select_data(start, end) self.cols = [] for lag in range(1, self.lags + 1): col = f'lag_{lag}' data[col] = data['returns'].shift(lag) self.cols.append(col) data.dropna(inplace=True) self.lagged_data = data def fit_model(self, start, end): ''' Implements the regression step. ''' self.prepare_lags(start, end) reg = np.linalg.lstsq(self.lagged_data[self.cols], np.sign(self.lagged_data['returns']), rcond=None)[0] self.reg = reg def run_strategy(self, start_in, end_in, start_out, end_out, lags=3): ''' Backtests the trading strategy. ''' self.lags = lags self.fit_model(start_in, end_in) self.results = self.select_data(start_out, end_out).iloc[lags:] self.prepare_lags(start_out, end_out) prediction = np.sign(np.dot(self.lagged_data[self.cols], self.reg)) self.results['prediction'] = prediction self.results['strategy'] = self.results['prediction'] * \ self.results['returns'] # determine when a trade takes place trades = self.results['prediction'].diff().fillna(0) != 0 # subtract transaction costs from return when trade takes place self.results['strategy'][trades] -= self.tc self.results['creturns'] = self.amount * \ self.results['returns'].c*msum().apply(np.exp) self.results['cstrategy'] = self.amount * \ self.results['strategy'].c*msum().apply(np.exp) # gross performance of the strategy aperf = self.results['cstrategy'].iloc[-1] # out-/underperformance of strategy operf = aperf - self.results['creturns'].iloc[-1] return round(aperf, 2), round(operf, 2) def plot_results(self): ''' Plots the cumulative performance of the trading strategy compared to the symbol. ''' if self.results is None: print('No results to plot yet. Run a strategy.') title = '%s | TC = %.4f' % (self.symbol, self.tc) self.results[['creturns', 'cstrategy']].plot(title=title, figsize=(10, 6))if __name__ == '__main__': lrbt = LRVectorBacktester('.SPX', '2010-1-1', '2018-06-29', 10000, 0.0) print(lrbt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31')) print(lrbt.run_strategy('2010-1-1', '2015-12-31', '2016-1-1', '2019-12-31')) lrbt = LRVectorBacktester('GDX', '2010-1-1', '2019-12-31', 10000, 0.001) print(lrbt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31', lags=5)) print(lrbt.run_strategy('2010-1-1', '2016-12-31', '2017-1-1', '2019-12-31', lags=5))

分类算法回测类

这里展示了一个基于逻辑回归的策略向量化回测的 Python 代码，作为一种标准的分类算法，用于预测市场运动方向：

## Python Module with Class# for Vectorized Backtesting# of Machine Learning-Based Strategies## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdfrom sklearn import linear_modelclass ScikitVectorBacktester(object): ''' Class for the vectorized backtesting of machine learning-based trading strategies. Attributes ========== symbol: str TR RIC (financial instrument) to work with start: str start date for data selection end: str end date for data selection amount: int, float amount to be invested at the beginning tc: float proportional transaction costs (e.g., 0.5% = 0.005) per trade model: str either 'regression' or 'logistic' Methods ======= get_data: retrieves and prepares the base data set select_data: selects a sub-set of the data prepare_features: prepares the features data for the model fitting fit_model: implements the fitting step run_strategy: runs the backtest for the regression-based strategy plot_results: plots the performance of the strategy compared to the symbol ''' def __init__(self, symbol, start, end, amount, tc, model): self.symbol = symbol self.start = start self.end = end self.amount = amount self.tc = tc self.results = None if model == 'regression': self.model = linear_model.LinearRegression() elif model == 'logistic': self.model = linear_model.LogisticRegression(C=1e6, solver='lbfgs', multi_class='ovr', max_iter=1000) else: raise ValueError('Model not known or not yet implemented.') self.get_data() def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['returns'] = np.log(raw / raw.shift(1)) self.data = raw.dropna() def select_data(self, start, end): ''' Selects sub-sets of the financial data. ''' data = self.data[(self.data.index >= start) & (self.data.index <= end)].copy() return data def prepare_features(self, start, end): ''' Prepares the feature columns for the regression and prediction steps. ''' self.data_subset = self.select_data(start, end) self.feature_columns = [] for lag in range(1, self.lags + 1): col = 'lag_{}'.format(lag) self.data_subset[col] = self.data_subset['returns'].shift(lag) self.feature_columns.append(col) self.data_subset.dropna(inplace=True) def fit_model(self, start, end): ''' Implements the fitting step. ''' self.prepare_features(start, end) self.model.fit(self.data_subset[self.feature_columns], np.sign(self.data_subset['returns'])) def run_strategy(self, start_in, end_in, start_out, end_out, lags=3): ''' Backtests the trading strategy. ''' self.lags = lags self.fit_model(start_in, end_in) # data = self.select_data(start_out, end_out) self.prepare_features(start_out, end_out) prediction = self.model.predict( self.data_subset[self.feature_columns]) self.data_subset['prediction'] = prediction self.data_subset['strategy'] = (self.data_subset['prediction'] * self.data_subset['returns']) # determine when a trade takes place trades = self.data_subset['prediction'].diff().fillna(0) != 0 # subtract transaction costs from return when trade takes place self.data_subset['strategy'][trades] -= self.tc self.data_subset['creturns'] = (self.amount * self.data_subset['returns'].c*msum().apply(np.exp)) self.data_subset['cstrategy'] = (self.amount * self.data_subset['strategy'].c*msum().apply(np.exp)) self.results = self.data_subset # absolute performance of the strategy aperf = self.results['cstrategy'].iloc[-1] # out-/underperformance of strategy operf = aperf - self.results['creturns'].iloc[-1] return round(aperf, 2), round(operf, 2) def plot_results(self): ''' Plots the cumulative performance of the trading strategy compared to the symbol. ''' if self.results is None: print('No results to plot yet. Run a strategy.') title = '%s | TC = %.4f' % (self.symbol, self.tc) self.results[['creturns', 'cstrategy']].plot(title=title, figsize=(10, 6))if __name__ == '__main__': scibt = ScikitVectorBacktester('.SPX', '2010-1-1', '2019-12-31', 10000, 0.0, 'regression') print(scibt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31')) print(scibt.run_strategy('2010-1-1', '2016-12-31', '2017-1-1', '2019-12-31')) scibt = ScikitVectorBacktester('.SPX', '2010-1-1', '2019-12-31', 10000, 0.0, 'logistic') print(scibt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31')) print(scibt.run_strategy('2010-1-1', '2016-12-31', '2017-1-1', '2019-12-31')) scibt = ScikitVectorBacktester('.SPX', '2010-1-1', '2019-12-31', 10000, 0.001, 'logistic') print(scibt.run_strategy('2010-1-1', '2019-12-31', '2010-1-1', '2019-12-31', lags=15)) print(scibt.run_strategy('2010-1-1', '2013-12-31', '2014-1-1', '2019-12-31', lags=15))

¹ Guido 和 Müller（2016 年）以及 VanderPlas（2016 年）的书提供了 Python 机器学习的实用、通用介绍。

² 例如，请参阅10 天的故事的讨论。

³ 详细信息，请参阅https://oreil.ly/hOwsE。

⁴ 详细信息，请参阅https://keras.io/layers/core/。

生活的实际悲剧与一个人的预设观念毫无关系。在事件中，人们总是被它们的简单性、设计的宏伟以及似乎固有的奇异因素所困惑。
让·科克托

一方面，NumPy 和 pandas 的向量化回测通常因为简洁的代码而便于实现，并且由于这些包针对这类操作进行了优化，执行速度快。然而，这种方法无法处理所有类型的交易策略，也无法处理交易现实中算法交易员面临的所有现象。在向量化回测方面，该方法的潜在缺点包括：

展望偏差

向量化回测基于可用的完整数据集，并且不考虑新数据的逐步到达。

简化

例如，固定交易成本不能通过向量化进行建模，后者主要基于相对收益。此外，每笔交易的固定金额或单个金融工具的不可分割性（例如股票的一部分）也不能得到适当的建模。

非递归性

算法，体现交易策略，可能随时间推移而对状态变量进行递归，比如截至某一时间点的盈亏或类似的路径相关统计数据。向量化不能处理这样的特性。

另一方面，事件驱动的回测允许通过更现实的方法来模拟交易现实。从基本层面上看，事件的特征是新数据的到达。根据每日结束数据回测苹果公司股票的交易策略，事件可能是苹果股票的新收盘价。它也可以是利率变化，或触及止损水平。事件驱动回测方法的优点通常包括：

增量方法

与交易现实一样，回测是在新数据逐步到达的前提下进行的，每一次变动和报价。

真实的建模

一个完全自由的模型，这些过程是由新的和具体的事件触发的。

路径依赖性

跟踪条件、递归或其他路径相关的统计数据非常简单，比如迄今为止见过的最高或最低价格，并将它们包含在交易算法中。

可重用性

回测不同类型的交易策略需要类似的基础功能，可以通过面向对象编程实现和统一。

靠近交易

事件驱动的回测系统的某些元素有时也可以用于自动实施交易策略。

在接下来的内容中，一个新事件通常由一个 条形图 标识，它代表新数据的一个单位。例如，事件可以是用于日内交易策略的 一分钟条形图 或基于每日收盘价格的交易策略的 一天条形图。

本章的组织如下。“回测基础类” 提供了一个用于基于事件的交易策略回测的基础类。“仅多头回测类” 和 “多空头回测类” 则利用基类实现了分别用于仅多头和多空头回测的类。

本章的目标是理解基于事件的建模，创建允许更现实的回测的类，并提供一个基础的回测基础设施，作为进一步增强和改进的起点。

当建立基础设施——以 Python 类的形式——用于基于事件的回测时，必须满足几个要求：

检索和准备数据

基类将负责数据检索，可能还包括为回测本身做准备。为了讨论的聚焦，基类应允许以 CSV 文件读取的每日结束数据作为数据类型。

帮助和便利函数

它应该提供一些帮助和便利函数，使得回测更加容易。例如，用于绘制数据的函数、打印状态变量或返回给定条形图的日期和价格信息的函数等。

下订单

基类应涵盖基本买入和卖出订单的下达。为简化起见，仅建模市价买入和卖出订单。

平仓持仓

在任何回测结束时，需要平掉所有市场持仓。基类将负责这最后的交易。

如果基类满足这些要求，则可以在此基础上构建基于简单移动平均线（SMA）、动量或均值回归（见第四章）、以及基于机器学习预测（见第五章）的策略回测类。“回测基础类” 展示了一个名为 BacktestBase 的此类基础类的实现。以下是该类的单个方法的概述。

关于特殊方法 __main__，只有几件值得注意的事情。首先，可用初始金额存储两次，分别在一个私有属性 _amount 中保持不变，以及在一个表示流动余额的常规属性 amount 中。默认假设是没有交易成本的：

 def __init__(self, symbol, start, end, amount, ftc=0.0, ptc=0.0, verbose=True): self.symbol = symbol self.start = start self.end = end self.initial_amount = amount ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.amount = amount ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.ftc = ftc ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.ptc = ptc ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.units = 0 ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.position = 0 ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) self.trades = 0 ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) self.verbose = verbose ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.get_data()

存储初始金额在一个私有属性中。

设置起始现金余额值。

定义每笔交易的固定交易成本。

定义每笔交易的比例交易成本。

初始时，投资组合中工具的单位（例如股票数量）。

将初始仓位设置为市场中性。

将初始交易数量设置为零。

将self.verbose设置为True以获得完整输出。

在初始化期间，调用get_data方法，该方法从 CSV 文件中获取提供的符号和给定时间间隔的 EOD 数据。它还计算对数收益率。随后的 Python 代码在第 4 和第五章节中广泛使用，因此这里不需要详细解释：

 def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['return'] = np.log(raw / raw.shift(1)) self.data = raw.dropna()

.plot_data()方法只是一个简单的辅助方法，用于绘制提供的符号（调整后的收盘）值的图表：

 def plot_data(self, cols=None): ''' Plots the closing prices for symbol. ''' if cols is None: cols = ['price'] self.data['price'].plot(figsize=(10, 6), title=self.symbol)

一个经常被调用的方法是.get_date_price()。对于给定的bar，它返回日期和价格信息：

 def get_date_price(self, bar): ''' Return date and price for bar. ''' date = str(self.data.index[bar])[:10] price = self.data.price.iloc[bar] return date, price

.print_balance()在给定的某个bar中打印出当前现金余额，而.print_net_wealth()则对净财富（=当前余额加上交易仓位的价值）做相同操作：

 def print_balance(self, bar): ''' Print out current cash balance info. ''' date, price = self.get_date_price(bar) print(f'{date} | current balance {self.amount:.2f}') def print_net_wealth(self, bar): ''' Print out current cash balance info. ''' date, price = self.get_date_price(bar) net_wealth = self.units * price + self.amount print(f'{date} | current net wealth {net_wealth:.2f}')

两个核心方法是.place_buy_order()和.place_sell_order()。它们允许模拟买卖金融工具的单位。首先是.place_buy_order()方法，详细注释如下：

 def place_buy_order(self, bar, units=None, amount=None): ''' Place a buy order. ''' date, price = self.get_date_price(bar) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) if units is None: ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) units = int(amount / price) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.amount -= (units * price) * (1 + self.ptc) + self.ftc ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.units += units ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.trades += 1 ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) if self.verbose: ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) print(f'{date} | selling {units} units at {price:.2f}') ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.print_balance(bar) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) self.print_net_wealth(bar) ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)

检索给定bar的日期和价格信息。

如果未提供units的值…

…给定amount的值，计算units的数量。（注意需要给定一个值。）此计算不包括交易成本。

当前现金余额减去要购买的工具单位的现金支出加上比例和固定的交易成本。请注意，没有检查是否有足够的流动性可用。

self.units的值增加了买入的单位数。

这将交易数量的计数器增加一。

如果self.verbose为True…

…打印有关交易执行的信息…

…当前的现金余额…

…和当前的净财富。

第二，.place_sell_order()方法与.place_buy_order()方法相比仅有两个小调整：

 def place_sell_order(self, bar, units=None, amount=None): ''' Place a sell order. ''' date, price = self.get_date_price(bar) if units is None: units = int(amount / price) self.amount += (units * price) * (1 - self.ptc) - self.ftc ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.units -= units ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.trades += 1 if self.verbose: print(f'{date} | selling {units} units at {price:.2f}') self.print_balance(bar) self.print_net_wealth(bar)

当前现金余额增加了销售收入减去交易成本。

self.units的值减少了卖出的单位数。

无论回测什么类型的交易策略，回测期末的头寸都需要平仓。BacktestBase类中的代码假设头寸没有清算，而是用其资产价值来计算和打印性能数据：

 def close_out(self, bar): ''' Closing out a long or short position. ''' date, price = self.get_date_price(bar) self.amount += self.units * price ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.units = 0 self.trades += 1 if self.verbose: print(f'{date} | inventory {self.units} units at {price:.2f}') print('=' * 55) print('Final balance [$] {:.2f}'.format(self.amount)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) perf = ((self.amount - self.initial_amount) / self.initial_amount * 100) print('Net Performance [%] {:.2f}'.format(perf)) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) print('Trades Executed [#] {:.2f}'.format(self.trades)) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) print('=' * 55)

最后没有减去交易成本。

最终余额包括当前现金余额加上交易头寸的价值。

这计算出百分比的净表现。

Python 脚本的最后部分是__main__部分，当文件作为脚本运行时执行：

if __name__ == '__main__': bb = BacktestBase('AAPL.O', '2010-1-1', '2019-12-31', 10000) print(bb.data.info()) print(bb.data.tail()) bb.plot_data()

它基于BacktestBase类实例化一个对象。这自动导致检索提供的符号的数据。图6-1 显示出结果图。以下输出显示了相应DataFrame对象的元信息和最近的五行数据：

In [1]: %run BacktestBase.py<class 'pandas.core.frame.DataFrame'>DatetimeIndex: 2515 entries, 2010-01-05 to 2019-12-31Data columns (total 2 columns): # Column Non-Null Count Dtype--- ------ -------------- ----- 0 price 2515 non-null float64 1 return 2515 non-null float64dtypes: float64(2)memory usage: 58.9 KBNone price returnDate2019-12-24 284.27 0.0009502019-12-26 289.91 0.0196462019-12-27 289.80 -0.0003802019-12-30 291.52 0.0059182019-12-31 293.65 0.007280In [2]:

图 6-1. 由`BacktestBase`类检索的`symbol`数据绘图

两个后续章节介绍了用于回测仅多头和多空交易策略的类。由于这些类依赖于本节中介绍的基础类，因此回测例程的实现相当简洁。

使用面向对象编程可以通过 Python 类的形式构建基本的回测基础设施。这种类提供了在不冗余、易于维护的方式下回测不同类型算法交易策略所需的标准功能。同时，通过简单的方式增强基础类以提供更多默认功能，这些功能可能有益于其他构建在其上的多个类。

某些投资者偏好或法规可能禁止作为交易策略的卖空。因此，交易员或投资组合经理只允许进入多头头寸或以现金或类似低风险资产（如货币市场账户）形式投放资本。“仅多头回测类”展示了名为BacktestLongOnly的仅多头策略回测类的代码。由于它依赖于并继承自BacktestBase类，因此基于 SMA、动量和均值回归实现的三种策略的代码相当简洁。

方法.run_mean_reversion_strategy()实现基于均值回归策略的回测过程。这个方法有详细的注释，因为从实现的角度来看可能有点棘手。然而，这些基本的见解很容易推广到实现其他两种策略的方法上：

 def run_mean_reversion_strategy(self, SMA, threshold): ''' Backtesting a mean reversion-based strategy. Parameters ========== SMA: int simple moving average in days threshold: float absolute value for deviation-based signal relative to SMA ''' msg = f'\n\nRunning mean reversion strategy | ' msg += f'SMA={SMA} & thr={threshold}' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) print('=' * 55) self.position = 0 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.trades = 0 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.amount = self.initial_amount ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.data['SMA'] = self.data['price'].rolling(SMA).mean() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) for bar in range(SMA, len(self.data)): ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) if self.position == 0: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) if (self.data['price'].iloc[bar] < self.data['SMA'].iloc[bar] - threshold): ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) self.place_buy_order(bar, amount=self.amount) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.position = 1 ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) elif self.position == 1: ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) if self.data['price'].iloc[bar] >= self.data['SMA'].iloc[bar]: ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png) self.place_sell_order(bar, units=self.units) ![12](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/12.png) self.position = 0 ![13](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/13.png) self.close_out(bar) ![14](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/14.png)

在开始时，此方法输出回测的主要参数概述。

该位置设置为市场中性，在这里更清晰地说明，而且无论如何都应该是这种情况。

如果另一个回测运行覆盖了该值，则当前现金余额将重置为初始金额。

这计算了策略实施所需的 SMA 值。

起始值SMA确保有足够的 SMA 值可用于实施和回测策略。

条件检查当前位置是否为市场中性。

如果仓位是市场中性的，就检查当前价格是否相对于 SMA 低到足以触发买入订单并持有多头。

这将以当前现金余额的金额执行买单。

市场仓位设置为多头。

检查条件是否为多头市场位置。

如果是这种情况，则检查当前价格是否已回到或高于 SMA 水平。

在这种情况下，为所有金融工具单位下达卖单。

市场仓位再次设置为中性。

在回测期结束时，如果有未平仓位，市场仓位将被关闭。

在“仅多头回测类”中执行 Python 脚本将产生回测结果，如下所示。这些示例说明了固定和比例交易成本的影响。首先，它们通常会降低整体表现。无论如何，考虑交易成本都会降低表现。其次，它们突显了一定策略随时间触发的交易数量的重要性。没有交易成本时，动量策略明显优于基于 SMA 的策略。而有了交易成本，基于 SMA 的策略优于动量策略，因为它依赖较少的交易：

Running SMA strategy | SMA1=42 & SMA2=252fixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 56204.95Net Performance [%] 462.05=======================================================Running momentum strategy | 60 daysfixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 136716.52Net Performance [%] 1267.17=======================================================Running mean reversion strategy | SMA=50 & thr=5fixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 53907.99Net Performance [%] 439.08=======================================================Running SMA strategy | SMA1=42 & SMA2=252fixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] 51959.62Net Performance [%] 419.60=======================================================Running momentum strategy | 60 daysfixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] 38074.26Net Performance [%] 280.74=======================================================Running mean reversion strategy | SMA=50 & thr=5fixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] 15375.48Net Performance [%] 53.75=======================================================

第五章强调了表现币的两面：正确预测市场方向的命中率和市场定时（即准确预测的时间）。这里显示的结果说明了甚至有一个“第三面”：策略触发的交易数量。要求更高频率交易的策略必须承担更高的交易成本，这很容易吞噬掉声称相对于其他没有或低交易成本策略的表现。其中，这经常为基于低成本的被动投资策略（例如，基于交易所交易基金（ETF）的策略）提供支持。

“多空回测课程”介绍了BacktestLongShort类，它也继承自BacktestBase类。除了实现不同策略的回测方法外，它还分别实现了两种额外的方法来进行多头和空头操作。仅详细介绍了.go_long()方法，因为.go_short()方法在相反方向上执行完全相同的操作：

 def go_long(self, bar, units=None, amount=None): ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) if self.position == -1: ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.place_buy_order(bar, units=-self.units) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) if units: ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.place_buy_order(bar, units=units) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) elif amount: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) if amount == 'all': ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) amount = self.amount ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.place_buy_order(bar, amount=amount) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) def go_short(self, bar, units=None, amount=None): if self.position == 1: self.place_sell_order(bar, units=self.units) if units: self.place_sell_order(bar, units=units) elif amount: if amount == 'all': amount = self.amount self.place_sell_order(bar, amount=amount)

除了bar，这些方法还期望交易工具的单位数或货币金额的数字。

在.go_long()情况下，首先检查是否有空头头寸。

如果是这样，这个短头寸首先被关闭。

然后检查是否给出了units…

…相应地触发买入订单。

如果给定了amount，可能会有两种情况。

首先，all的值被翻译成…

…当前现金余额中所有可用的现金。

其次，该值是一个数字，然后直接用来下相应的买入订单。请注意，没有检查是否有足够的流动性。

为了在整个实现中保持简洁，Python 类中有许多简化，这些简化将责任转移给用户。例如，这些类不关心是否有足够的流动性来执行交易。这是经济简化，因为理论上，可以假设算法交易者拥有足够甚至无限的信贷。另一个例子是，某些方法期望至少指定两个参数中的一个（units或amount）。没有代码捕捉两者都未设置的情况。这是技术上的简化。

以下展示了BacktestLongShort类的.run_mean_reversion_strategy()方法中的核心循环。再次选择均值回归策略，因为实现稍微复杂一些。例如，这是唯一会导致中间市场中性位置的策略。这要求与其他两种策略相比需要更多的检查，如在“长-短回测类”中所见：

 for bar in range(SMA, len(self.data)): if self.position == 0: ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) if (self.data['price'].iloc[bar] < self.data['SMA'].iloc[bar] - threshold): ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.go_long(bar, amount=self.initial_amount) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.position = 1 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) elif (self.data['price'].iloc[bar] > self.data['SMA'].iloc[bar] + threshold): ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.go_short(bar, amount=self.initial_amount) self.position = -1 ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) elif self.position == 1: ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) if self.data['price'].iloc[bar] >= self.data['SMA'].iloc[bar]: ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.place_sell_order(bar, units=self.units) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) self.position = 0 ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) elif self.position == -1: ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png) if self.data['price'].iloc[bar] <= self.data['SMA'].iloc[bar]: ![12](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/12.png) self.place_buy_order(bar, units=-self.units) ![13](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/13.png) self.position = 0 ![14](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/14.png) self.close_out(bar)

第一个顶层条件检查头寸是否市场中性。

如果是这样，然后检查当前价格是否相对于 SMA 足够低。

在这种情况下，调用.go_long()方法…

…并且市场位置设置为长。

如果当前价格相对于 SMA 足够高，则调用.go_short()方法…

…并且市场位置设置为短。

第二个顶层条件检查长期市场位置。

在这种情况下，进一步检查当前价格是否再次达到或高于 SMA 水平。

如果是这样，长头寸通过出售投资组合中的所有单位来平仓。

市场位置被重置为中性。

最后，第三个顶层条件检查空头位置。

如果当前价格等于或低于 SMA…

…触发所有空头单位的买入订单以平仓空头头寸。

然后市场位置被重置为中性。

在“多头空头回测类”执行 Python 脚本后，可以得到进一步揭示策略特性的性能结果。人们可能倾向于认为添加对金融工具进行做空的灵活性会带来更好的结果。然而，现实表明这未必正确。所有策略在没有和在交易成本后表现都更差。有些配置甚至会累积净损失或负债。尽管这些仅是具体的结果，但它们说明在这样的背景下过早得出结论并不明智，并且不考虑积累债务的限制：

Running SMA strategy | SMA1=42 & SMA2=252fixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 45631.83Net Performance [%] 356.32=======================================================Running momentum strategy | 60 daysfixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 105236.62Net Performance [%] 952.37=======================================================Running mean reversion strategy | SMA=50 & thr=5fixed costs 0.0 | proportional costs 0.0=======================================================Final balance [$] 17279.15Net Performance [%] 72.79=======================================================Running SMA strategy | SMA1=42 & SMA2=252fixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] 38369.65Net Performance [%] 283.70=======================================================Running momentum strategy | 60 daysfixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] 6883.45Net Performance [%] -31.17=======================================================Running mean reversion strategy | SMA=50 & thr=5fixed costs 10.0 | proportional costs 0.01=======================================================Final balance [$] -5110.97Net Performance [%] -151.11=======================================================

例如，在交易差价合约（CFD）的背景下，可能会出现交易可能耗尽所有初始权益甚至导致负债的情况。这些是高度杠杆化的产品，交易者只需要作为初始保证金放下，比如说，头寸价值的 5%（当杠杆是 20 时）。如果头寸价值变化了 10%，交易者可能需要满足相应的追加保证金要求。对于 100,000 美元的多头头寸，需要 5,000 美元的权益。如果头寸下跌到 90,000 美元，权益被清零，交易者必须再支付 5,000 美元来弥补损失。这假设没有设立会在剩余权益降到 0 美元时关闭头寸的保证金止损。

本章介绍了用于事件驱动交易策略回测的类。与矢量化回测相比，事件驱动回测通过有意和大量使用循环和迭代来处理每个新事件（通常是新数据到来）可以单独处理。这种方法可以更灵活地应对固定交易成本或更复杂的策略（及其变体），等等。

“回测基类”提供了一个基类，其中包含对各种交易策略进行回测的某些有用方法。“仅多头回测类”和“多头空头回测类”在此基础上构建，实现了允许对仅多头和多头空头交易策略进行回测的类。主要出于比较的原因，实现包括在第四章正式介绍的所有三种策略。通过本章的类作为起点，可以轻松实现增强和改进。

前几章介绍了本章涵盖的三种交易策略的基本思想和概念。本章首次更系统地使用了 Python 类和面向对象编程（OOP）。关于 Python 和 Python 数据模型的 OOP 良好介绍可在 Ramalho（2021）中找到。关于金融中应用 OOP 的更简明介绍见 Hilpisch（2018 年，第六章）：

Hilpisch, Yves. 2018. Python 金融：掌握数据驱动金融. 第 2 版. Sebastopol：O’Reilly.
Ramalho, Luciano. 2021. 流畅的 Python：清晰、简洁和高效的编程. 第 2 版. Sebastopol：O’Reilly.

Python 生态系统提供了许多可选的软件包，用于算法交易策略的回测。其中四个如下：

例如，Zipline支持用于算法交易策略回测的流行Quantopian平台，但也可以在本地安装和使用。

尽管这些软件包可能允许比本章介绍的简单类更彻底地回测算法交易策略，但本书的主要目标是使读者和算法交易者能够自主实现 Python 代码。即使以后使用标准软件包进行实际回测，对不同方法及其机制的良好理解也是有益的，如果不是必需的话。

本节介绍了本章中引用和使用的 Python 脚本。

回测基类

以下 Python 代码包含了基于事件的回测的基类：

## Python Script with Base Class# for Event-Based Backtesting## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import numpy as npimport pandas as pdfrom pylab import mpl, pltplt.style.use('seaborn')mpl.rcParams['font.family'] = 'serif'class BacktestBase(object): ''' Base class for event-based backtesting of trading strategies. Attributes ========== symbol: str TR RIC (financial instrument) to be used start: str start date for data selection end: str end date for data selection amount: float amount to be invested either once or per trade ftc: float fixed transaction costs per trade (buy or sell) ptc: float proportional transaction costs per trade (buy or sell) Methods ======= get_data: retrieves and prepares the base data set plot_data: plots the closing price for the symbol get_date_price: returns the date and price for the given bar print_balance: prints out the current (cash) balance print_net_wealth: prints out the current net wealth place_buy_order: places a buy order place_sell_order: places a sell order close_out: closes out a long or short position ''' def __init__(self, symbol, start, end, amount, ftc=0.0, ptc=0.0, verbose=True): self.symbol = symbol self.start = start self.end = end self.initial_amount = amount self.amount = amount self.ftc = ftc self.ptc = ptc self.units = 0 self.position = 0 self.trades = 0 self.verbose = verbose self.get_data() def get_data(self): ''' Retrieves and prepares the data. ''' raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() raw = pd.DataFrame(raw[self.symbol]) raw = raw.loc[self.start:self.end] raw.rename(columns={self.symbol: 'price'}, inplace=True) raw['return'] = np.log(raw / raw.shift(1)) self.data = raw.dropna() def plot_data(self, cols=None): ''' Plots the closing prices for symbol. ''' if cols is None: cols = ['price'] self.data['price'].plot(figsize=(10, 6), title=self.symbol) def get_date_price(self, bar): ''' Return date and price for bar. ''' date = str(self.data.index[bar])[:10] price = self.data.price.iloc[bar] return date, price def print_balance(self, bar): ''' Print out current cash balance info. ''' date, price = self.get_date_price(bar) print(f'{date} | current balance {self.amount:.2f}') def print_net_wealth(self, bar): ''' Print out current cash balance info. ''' date, price = self.get_date_price(bar) net_wealth = self.units * price + self.amount print(f'{date} | current net wealth {net_wealth:.2f}') def place_buy_order(self, bar, units=None, amount=None): ''' Place a buy order. ''' date, price = self.get_date_price(bar) if units is None: units = int(amount / price) self.amount -= (units * price) * (1 + self.ptc) + self.ftc self.units += units self.trades += 1 if self.verbose: print(f'{date} | selling {units} units at {price:.2f}') self.print_balance(bar) self.print_net_wealth(bar) def place_sell_order(self, bar, units=None, amount=None): ''' Place a sell order. ''' date, price = self.get_date_price(bar) if units is None: units = int(amount / price) self.amount += (units * price) * (1 - self.ptc) - self.ftc self.units -= units self.trades += 1 if self.verbose: print(f'{date} | selling {units} units at {price:.2f}') self.print_balance(bar) self.print_net_wealth(bar) def close_out(self, bar): ''' Closing out a long or short position. ''' date, price = self.get_date_price(bar) self.amount += self.units * price self.units = 0 self.trades += 1 if self.verbose: print(f'{date} | inventory {self.units} units at {price:.2f}') print('=' * 55) print('Final balance [$] {:.2f}'.format(self.amount)) perf = ((self.amount - self.initial_amount) / self.initial_amount * 100) print('Net Performance [%] {:.2f}'.format(perf)) print('Trades Executed [#] {:.2f}'.format(self.trades)) print('=' * 55)if __name__ == '__main__': bb = BacktestBase('AAPL.O', '2010-1-1', '2019-12-31', 10000) print(bb.data.info()) print(bb.data.tail()) bb.plot_data()

仅多头回测类

以下介绍了 Python 代码，其中包含了基于仅多头策略的事件驱动回测类，实现了基于简单移动平均线(SMAs)、动量和均值回归的策略：

## Python Script with Long Only Class# for Event-Based Backtesting## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#from BacktestBase import *class BacktestLongOnly(BacktestBase): def run_sma_strategy(self, SMA1, SMA2): ''' Backtesting an SMA-based strategy. Parameters ========== SMA1, SMA2: int shorter and longer term simple moving average (in days) ''' msg = f'\n\nRunning SMA strategy | SMA1={SMA1} & SMA2={SMA2}' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 # initial neutral position self.trades = 0 # no trades yet self.amount = self.initial_amount # reset initial capital self.data['SMA1'] = self.data['price'].rolling(SMA1).mean() self.data['SMA2'] = self.data['price'].rolling(SMA2).mean() for bar in range(SMA2, len(self.data)): if self.position == 0: if self.data['SMA1'].iloc[bar] > self.data['SMA2'].iloc[bar]: self.place_buy_order(bar, amount=self.amount) self.position = 1 # long position elif self.position == 1: if self.data['SMA1'].iloc[bar] < self.data['SMA2'].iloc[bar]: self.place_sell_order(bar, units=self.units) self.position = 0 # market neutral self.close_out(bar) def run_momentum_strategy(self, momentum): ''' Backtesting a momentum-based strategy. Parameters ========== momentum: int number of days for mean return calculation ''' msg = f'\n\nRunning momentum strategy | {momentum} days' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 # initial neutral position self.trades = 0 # no trades yet self.amount = self.initial_amount # reset initial capital self.data['momentum'] = self.data['return'].rolling(momentum).mean() for bar in range(momentum, len(self.data)): if self.position == 0: if self.data['momentum'].iloc[bar] > 0: self.place_buy_order(bar, amount=self.amount) self.position = 1 # long position elif self.position == 1: if self.data['momentum'].iloc[bar] < 0: self.place_sell_order(bar, units=self.units) self.position = 0 # market neutral self.close_out(bar) def run_mean_reversion_strategy(self, SMA, threshold): ''' Backtesting a mean reversion-based strategy. Parameters ========== SMA: int simple moving average in days threshold: float absolute value for deviation-based signal relative to SMA ''' msg = f'\n\nRunning mean reversion strategy | ' msg += f'SMA={SMA} & thr={threshold}' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 self.trades = 0 self.amount = self.initial_amount self.data['SMA'] = self.data['price'].rolling(SMA).mean() for bar in range(SMA, len(self.data)): if self.position == 0: if (self.data['price'].iloc[bar] < self.data['SMA'].iloc[bar] - threshold): self.place_buy_order(bar, amount=self.amount) self.position = 1 elif self.position == 1: if self.data['price'].iloc[bar] >= self.data['SMA'].iloc[bar]: self.place_sell_order(bar, units=self.units) self.position = 0 self.close_out(bar)if __name__ == '__main__': def run_strategies(): lobt.run_sma_strategy(42, 252) lobt.run_momentum_strategy(60) lobt.run_mean_reversion_strategy(50, 5) lobt = BacktestLongOnly('AAPL.O', '2010-1-1', '2019-12-31', 10000, verbose=False) run_strategies() # transaction costs: 10 USD fix, 1% variable lobt = BacktestLongOnly('AAPL.O', '2010-1-1', '2019-12-31', 10000, 10.0, 0.01, False) run_strategies()

多空回测类

以下 Python 代码包含了基于事件的多空策略的回测基类，实现了基于简单移动平均线(SMAs)、动量和均值回归的策略：

## Python Script with Long-Short Class# for Event-Based Backtesting## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#from BacktestBase import *class BacktestLongShort(BacktestBase): def go_long(self, bar, units=None, amount=None): if self.position == -1: self.place_buy_order(bar, units=-self.units) if units: self.place_buy_order(bar, units=units) elif amount: if amount == 'all': amount = self.amount self.place_buy_order(bar, amount=amount) def go_short(self, bar, units=None, amount=None): if self.position == 1: self.place_sell_order(bar, units=self.units) if units: self.place_sell_order(bar, units=units) elif amount: if amount == 'all': amount = self.amount self.place_sell_order(bar, amount=amount) def run_sma_strategy(self, SMA1, SMA2): msg = f'\n\nRunning SMA strategy | SMA1={SMA1} & SMA2={SMA2}' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 # initial neutral position self.trades = 0 # no trades yet self.amount = self.initial_amount # reset initial capital self.data['SMA1'] = self.data['price'].rolling(SMA1).mean() self.data['SMA2'] = self.data['price'].rolling(SMA2).mean() for bar in range(SMA2, len(self.data)): if self.position in [0, -1]: if self.data['SMA1'].iloc[bar] > self.data['SMA2'].iloc[bar]: self.go_long(bar, amount='all') self.position = 1 # long position if self.position in [0, 1]: if self.data['SMA1'].iloc[bar] < self.data['SMA2'].iloc[bar]: self.go_short(bar, amount='all') self.position = -1 # short position self.close_out(bar) def run_momentum_strategy(self, momentum): msg = f'\n\nRunning momentum strategy | {momentum} days' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 # initial neutral position self.trades = 0 # no trades yet self.amount = self.initial_amount # reset initial capital self.data['momentum'] = self.data['return'].rolling(momentum).mean() for bar in range(momentum, len(self.data)): if self.position in [0, -1]: if self.data['momentum'].iloc[bar] > 0: self.go_long(bar, amount='all') self.position = 1 # long position if self.position in [0, 1]: if self.data['momentum'].iloc[bar] <= 0: self.go_short(bar, amount='all') self.position = -1 # short position self.close_out(bar) def run_mean_reversion_strategy(self, SMA, threshold): msg = f'\n\nRunning mean reversion strategy | ' msg += f'SMA={SMA} & thr={threshold}' msg += f'\nfixed costs {self.ftc} | ' msg += f'proportional costs {self.ptc}' print(msg) print('=' * 55) self.position = 0 # initial neutral position self.trades = 0 # no trades yet self.amount = self.initial_amount # reset initial capital self.data['SMA'] = self.data['price'].rolling(SMA).mean() for bar in range(SMA, len(self.data)): if self.position == 0: if (self.data['price'].iloc[bar] < self.data['SMA'].iloc[bar] - threshold): self.go_long(bar, amount=self.initial_amount) self.position = 1 elif (self.data['price'].iloc[bar] > self.data['SMA'].iloc[bar] + threshold): self.go_short(bar, amount=self.initial_amount) self.position = -1 elif self.position == 1: if self.data['price'].iloc[bar] >= self.data['SMA'].iloc[bar]: self.place_sell_order(bar, units=self.units) self.position = 0 elif self.position == -1: if self.data['price'].iloc[bar] <= self.data['SMA'].iloc[bar]: self.place_buy_order(bar, units=-self.units) self.position = 0 self.close_out(bar)if __name__ == '__main__': def run_strategies(): lsbt.run_sma_strategy(42, 252) lsbt.run_momentum_strategy(60) lsbt.run_mean_reversion_strategy(50, 5) lsbt = BacktestLongShort('EUR=', '2010-1-1', '2019-12-31', 10000, verbose=False) run_strategies() # transaction costs: 10 USD fix, 1% variable lsbt = BacktestLongShort('AAPL.O', '2010-1-1', '2019-12-31', 10000, 10.0, 0.01, False) run_strategies()

如果你想找到宇宙的秘密，想象能量、频率和振动的概念。
尼古拉·特斯拉

开发交易思路并对其进行回测是一个相对异步且非关键的过程，期间可能有多个步骤会重复或不重复，期间没有资本风险，性能和速度也不是最重要的需求。当转向市场以部署交易策略时，规则会发生显著变化。数据实时到达且通常是大量到达，这使得实时处理数据和基于流数据做出实时决策成为必要。本章讨论处理实时数据的问题，其中套接字通常是技术工具的首选。在这种背景下，以下是一些核心技术术语的简要介绍：

网络套接字

计算机网络中连接的终点，简称为套接字。

套接字地址

由互联网协议（IP）地址和端口号组合而成。

套接字协议

定义和处理套接字通信的协议，如传输控制协议（TCP）。

套接字对

本地套接字和远程套接字的组合，它们相互通信。

套接字 API

允许控制套接字及其通信的应用程序接口。

本章重点介绍ZeroMQ作为一款轻量、快速和可扩展的套接字编程库的使用。它在多个平台上都可用，并为大多数流行的编程语言提供了包装器。ZeroMQ支持不同的套接字通信模式。其中一种模式是所谓的发布-订阅（PUB-SUB）模式，其中一个套接字发布数据，多个套接字同时检索数据。这类似于广播自己的节目并通过收音机设备同时被成千上万的人收听的电台。

在给定的PUB-SUB模式下，算法交易的一个基本应用场景是从交易所、交易平台或数据服务提供商检索实时金融数据。假设你已经基于 EUR/USD 货币对开发了一个日内交易思路并进行了彻底的回测。在部署时，您需要能够实时接收和处理价格数据。这正好符合PUB-SUB模式。一个中心实例在新的 tick 数据可用时广播该数据，同时您和可能还有成千上万的其他人同时接收和处理它。¹

本章的组织结构如下。“运行一个简单的 Tick 数据服务器” 描述了如何为示例财务数据实现和运行一个 Tick 数据服务器。“连接一个简单的 Tick 数据客户端” 实现了一个 Tick 数据客户端，用于连接到 Tick 数据服务器。“实时信号生成” 展示了如何基于 Tick 数据服务器的数据实时生成交易信号。最后，“使用 Plotly 可视化流数据” 介绍了 Plotly 绘图包作为实时绘制流数据的有效方法。

本章的目标是提供一套工具集和方法，以便在算法交易的上下文中处理流数据。

本章的代码大量使用端口进行套接字通信，并且需要同时执行两个或两个以上的脚本。因此建议在不同的终端实例中执行本章的代码，运行不同的 Python 内核。在单个 Jupyter Notebook 中执行，例如，通常不起作用。然而，可以在终端中执行 Tick 数据服务器脚本（“运行一个简单的 Tick 数据服务器”），并在 Jupyter Notebook 中检索数据（“使用 Plotly 可视化流数据”）。

本节展示了如何运行一个基于模拟金融工具价格的简单 Tick 数据服务器。用于数据生成的模型是几何布朗运动（不考虑股息），其精确的欧拉离散化方法如方程式 7-1 所示。在这里， $S$ 是工具价格， $r$ 是常数短期利率， $σ$ 是常数波动率因子， $z$ 是标准正态随机变量。 $Δ t$ 是工具价格的两次离散观察之间的间隔。

方程式 7-1. 几何布朗运动的欧拉离散化

$S_{t} = S_{t - Δ t} \cdot exp ((r - \frac{σ^{2}}{2}) Δ t + σ \sqrt{Δ t} z)$

利用这个模型，“示例 Tick 数据服务器” 提供了一个使用 ZeroMQ 和一个名为 InstrumentPrice 的类实现的 Python 脚本，以随机方式发布新的模拟 Tick 数据。发布是通过两种方式随机进行的。首先，股票价格基于蒙特卡罗模拟。第二种是两次发布事件之间的时间间隔的长度是随机的。本节的其余部分详细解释了脚本的主要部分。

以下脚本的第一部分执行一些导入，以及其他事情，用于 Python 的 ZeroMQ 包装器。它还实例化了打开 PUB 类型套接字所需的主要对象：

import zmq ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)import mathimport timeimport randomcontext = zmq.Context() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)socket = context.socket(zmq.PUB) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)socket.bind('tcp://0.0.0.0:5555') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

这里导入了 Python 对ZeroMQ库的包装。

实例化一个Context对象。它是套接字通信的中心对象。

基于PUB套接字类型（“通信模式”）定义套接字本身。

套接字绑定到本地 IP 地址（在 Linux 和 Mac OS 上为0.0.0.0，在 Windows 上为127.0.0.1）和端口号 5555。

InstrumentPrice类用于模拟随时间变化的工具价格值。作为属性，除了工具符号和实例创建时间外，还有几何布朗运动的主要参数。唯一的方法.simulate_value()根据最后一次调用以来经过的时间和随机因子生成新的股票价格值：

class InstrumentPrice(object): def __init__(self): self.symbol = 'SYMBOL' self.t = time.time() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.value = 100. self.sigma = 0.4 self.r = 0.01 def simulate_value(self): ''' Generates a new, random stock price. ''' t = time.time() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) dt = (t - self.t) / (252 * 8 * 60 * 60) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) dt *= 500 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.t = t ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.value *= math.exp((self.r - 0.5 * self.sigma ** 2) * dt + self.sigma * math.sqrt(dt) * random.gauss(0, 1)) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) return self.value

属性t存储初始化的时间。

调用.simulate_value()方法时记录当前时间。

dt表示当前时间与存储在self.t中的时间之间的(交易)年分数间隔。

为了具有更大的工具价格波动，此行代码通过任意因子缩放dt变量。

属性t使用当前时间更新，表示下次调用方法的参考点。

基于几何布朗运动的欧拉方案，模拟新的工具价格。

脚本的主要部分包括类型为InstrumentPrice的对象的实例化和一个无限的while循环。在while循环期间，模拟新的工具价格，并创建、打印并通过套接字发送消息。

最后，执行暂停一段随机时间：

ip = InstrumentPrice() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)while True: ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) msg = '{} {:.2f}'.format(ip.symbol, ip.simulate_value()) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) print(msg) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) socket.send_string(msg) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) time.sleep(random.random() * 2) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)

此行实例化一个InstrumentPrice对象。

开始了一个无限的while循环。

消息文本基于symbol属性和新模拟的股票价格值生成。

将消息str对象打印到标准输出。

它也被发送到已订阅的套接字。

循环的执行暂停了一段随机时间（在 0 到 2 秒之间），模拟了市场上新的 tick 数据的随机到达。

执行脚本将如下打印出消息：

(base) pro:ch07 yves$ Python TickServer.pySYMBOL 100.00SYMBOL 99.65SYMBOL 99.28SYMBOL 99.09SYMBOL 98.76SYMBOL 98.83SYMBOL 98.82SYMBOL 98.92SYMBOL 98.57SYMBOL 98.81SYMBOL 98.79SYMBOL 98.80

此时还不能验证脚本是否也通过绑定到tcp://0.0.0.0:5555（在 Windows 上是tcp://127.0.0.1:5555）的套接字发送相同的消息。为此，需要另一个订阅发布套接字的套接字来完成套接字对。

在许多情况下，金融工具价格的蒙特卡洛模拟依赖于均匀时间间隔（例如“一个交易日”）。在许多情况下，当与更长的视野内的日终收盘价格一起使用时，这是一个“足够好”的近似值。在涉及到日内 tick 数据的情况下，数据的随机到达是需要考虑的重要特征。用于 tick 数据服务器的 Python 脚本通过在执行期间暂停的随机时间间隔实现了数据的随机到达时间。

用于 tick 数据服务器的代码已经相当简洁，其中InstrumentPrice模拟类代表最长的部分。如“Tick 数据客户端”中所示，相应的 tick 数据客户端代码更加简洁。只需几行代码实例化主Context对象，连接到发布套接字，并订阅SYMBOL频道，这恰好是此处唯一可用的频道。在while循环中，接收并打印基于字符串的消息。这使得脚本相当简短。

以下脚本的初始部分几乎与 tick 数据服务器脚本对称：

import zmq ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)context = zmq.Context() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)socket = context.socket(zmq.SUB) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)socket.connect('tcp://0.0.0.0:5555') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)socket.setsockopt_string(zmq.SUBSCRIBE, 'SYMBOL') ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)

这导入了ZeroMQ库的 Python 包装器。

对于客户端，主要对象也是zmq.Context的实例。

从这里开始，代码是不同的；套接字类型设置为SUB。

此套接字连接到相应的 IP 地址和端口组合。

这行代码定义了所谓的频道，该频道订阅了套接字。在这里，只有一个频道，但仍然需要规范。然而，在实际应用中，您可能通过套接字连接接收来自多种不同符号的数据。

while循环归结为获取服务器套接字发送的消息并将其打印出来：

while True: ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) data = socket.recv_string() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) print(data) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)

此套接字在一个无限循环中接收数据。

这是接收数据（基于字符串消息）的主要代码行。

data被打印到stdout。

Python 套接字客户端的输出与 Python 套接字服务器的输出完全相同：

(base) pro:ch07 yves$ Python TickClient.pySYMBOL 100.00SYMBOL 99.65SYMBOL 99.28SYMBOL 99.09SYMBOL 98.76SYMBOL 98.83SYMBOL 98.82SYMBOL 98.92SYMBOL 98.57SYMBOL 98.81SYMBOL 98.79SYMBOL 98.80

通过套接字通信以字符串消息的形式检索数据仅是基于数据完成任务的先决条件，例如实时生成交易信号或数据可视化。这是接下来两个章节所涵盖的内容。

ZeroMQ还允许传输其他对象类型。例如，可以通过套接字发送 Python 对象。为此，默认情况下使用pickle对对象进行序列化和反序列化。实现这一目标的方法是.send_pyobj()和.recv_pyobj()（参见PyZMQ API）。然而，在实践中，平台和数据提供商涵盖了多种环境，Python 仅是其中的一种语言。因此，通常使用基于字符串的套接字通信，例如与诸如JSON等标准数据格式的结合。

在线算法是一种基于逐步接收的数据（逐位递增）的算法。这样的算法仅了解相关变量和参数的当前状态和先前状态，但不了解未来的情况。这对于金融交易算法而言是一种现实的设置，其中任何完全预见性的元素都被排除在外。相比之下，离线算法从一开始就知道完整的数据集。许多计算机科学中的算法属于离线算法的范畴，例如对数字列表的排序算法。

要基于在线算法实时生成信号，需要随时间收集和处理数据。例如，考虑基于最后三个五秒间隔的时间序列动量的交易策略（参见第四章）。需要收集 Tick 数据，然后对其进行重新采样，并根据重新采样后的数据集计算动量。随着时间的推移，持续进行增量更新。"动量在线算法" 提供了一个 Python 脚本，实现了如前所述的动量策略，作为一个在线算法。从技术上讲，除了处理套接字通信之外，还有两个主要部分。首先是 Tick 数据的检索和存储：

df = pd.DataFrame() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)mom = 3 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)min_length = mom + 1 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)while True: data = socket.recv_string() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) t = datetime.datetime.now() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) sym, value = data.split() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) df = df.append(pd.DataFrame({sym: float(value)}, index=[t])) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)

实例化一个空的pandas DataFrame以收集 Tick 数据。

定义用于动量计算的时间间隔数。

指定触发信号生成的（初始）最小长度。

通过套接字连接检索 tick 数据。

为数据检索生成一个时间戳。

将基于字符串的消息分割为符号和数值（此处仍然是一个str对象）。

这行代码首先生成一个临时的DataFrame对象，然后将其附加到现有的DataFrame对象中。

第二步是数据重新采样和处理，如下所示的 Python 代码。这基于截至某一时间点收集的 tick 数据进行。在此步骤中，基于重新采样的数据计算对数收益率，并推导出动量。动量的符号定义了在金融工具中应采取的定位：

 dr = df.resample('5s', label='right').last() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) dr['returns'] = np.log(dr / dr.shift(1)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) if len(dr) > min_length: min_length += 1 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) dr['momentum'] = np.sign(dr['returns'].rolling(mom).mean()) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) print('\n' + '=' * 51) print('NEW SIGNAL | {}'.format(datetime.datetime.now())) print('=' * 51) print(dr.iloc[:-1].tail()) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) if dr['momentum'].iloc[-2] == 1.0: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) print('\nLong market position.') # take some action (e.g., place buy order) elif dr['momentum'].iloc[-2] == -1.0: ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) print('\nShort market position.') # take some action (e.g., place sell order)

tick 数据被重新采样为五秒间隔，取最后一个可用的 tick 值作为相关值。

这计算了五秒间隔内的对数收益率。

增加重新采样后的DataFrame对象的最小长度一行。

根据从三个重新采样的时间间隔得到的对数收益率，推导出动量及其定位。

打印重新采样后的DataFrame对象的最后五行。

动量值为1.0表示持有多头市场仓位。在实际应用中，第一个信号或信号的变化将触发特定的操作，例如向经纪人下达订单。请注意，momentum列的倒数第二个值被使用，因为最后一个值基于尚未完成的相关时间间隔的不完整数据。技术上，这是由于使用pandas的.resample()方法，并带有label='right'参数设置。

类似地，动量值为-1.0意味着持有空头市场仓位，并可能触发某些操作，例如向经纪人下达卖出订单。同样，momentum列的倒数第二个值被使用。

当脚本执行时，根据所选择的参数，需要一定时间，直到有足够的（重新采样的）数据生成第一个信号为止。

这里是在线交易算法脚本的中间示例输出：

(base) yves@pro ch07 $ python OnlineAlgorithm.py===================================================NEW SIGNAL | 2020-05-23 11:33:31.233606=================================================== SYMBOL ... momentum2020-05-23 11:33:15 98.65 ... NaN2020-05-23 11:33:20 98.53 ... NaN2020-05-23 11:33:25 98.83 ... NaN2020-05-23 11:33:30 99.33 ... 1.0[4 rows x 3 columns]Long market position.===================================================NEW SIGNAL | 2020-05-23 11:33:36.185453=================================================== SYMBOL ... momentum2020-05-23 11:33:15 98.65 ... NaN2020-05-23 11:33:20 98.53 ... NaN2020-05-23 11:33:25 98.83 ... NaN2020-05-23 11:33:30 99.33 ... 1.02020-05-23 11:33:35 97.76 ... -1.0[5 rows x 3 columns]Short market position.===================================================NEW SIGNAL | 2020-05-23 11:33:40.077869=================================================== SYMBOL ... momentum2020-05-23 11:33:20 98.53 ... NaN2020-05-23 11:33:25 98.83 ... NaN2020-05-23 11:33:30 99.33 ... 1.02020-05-23 11:33:35 97.76 ... -1.02020-05-23 11:33:40 98.51 ... -1.0[5 rows x 3 columns]Short market position.

根据所呈现的 Tick 客户端脚本，实施基于 SMA 的策略和均值回归策略作为在线算法是一个很好的练习。

实时流数据的可视化通常是一项具有挑战性的任务。幸运的是，如今有许多技术和 Python 包可以显著简化这样的任务。接下来，我们将使用Plotly，它既是一种技术，也是一种用于生成漂亮的交互式静态和流数据图的服务。为了跟进，需要安装plotly包。同时，在使用 Jupyter Lab 时，还需安装几个 Jupyter Lab 扩展。应在终端上执行以下命令：

conda install plotly ipywidgetsjupyter labextension install jupyterlab-plotlyjupyter labextension install @jupyter-widgets/jupyterlab-managerjupyter labextension install plotlywidget

基础知识

一旦安装了所需的包和扩展，生成流式图表就非常高效。第一步是创建一个 Plotly 图表小部件：

In [1]: import zmq from datetime import datetime import plotly.graph_objects as go ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [2]: symbol = 'SYMBOL'In [3]: fig = go.FigureWidget() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) fig.add_scatter() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) fig ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[3]: FigureWidget({ 'data': [{'type': 'scatter', 'uid': 'e1a65f25-287d-4021-a210-c2f41f32426a'}], 'layout': {'t…

这将从plotly导入图形对象。

这在 Jupyter Notebook 中实例化了一个 Plotly 图表小部件。

第二步是设置与样本刻度数据服务器的套接字通信，该服务器需要在单独的 Python 进程中与相同机器上运行。传入的数据通过时间戳增强，并收集在list对象中。这些list对象反过来用于更新图表小部件的data对象（参见图 7-1）：

In [4]: context = zmq.Context()In [5]: socket = context.socket(zmq.SUB)In [6]: socket.connect('tcp://0.0.0.0:5555')In [7]: socket.setsockopt_string(zmq.SUBSCRIBE, 'SYMBOL')In [8]: times = list() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) prices = list() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [9]: for _ in range(50): msg = socket.recv_string() t = datetime.now() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) times.append(t) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) _, price = msg.split() prices.append(float(price)) fig.data[0].x = times ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) fig.data[0].y = prices ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

list对象用于时间戳。

list对象用于实时价格。

生成时间戳并附加它。

使用修改后的x（times）和y（prices）数据集更新data对象。

图 7-1。通过套接字连接实时检索的流价格数据的绘图

三个实时流

使用 Plotly 进行流式图绘制时，可以有多个图形对象。例如，在实时可视化价格刻度之外，还可以方便地显示两个简单移动平均线（SMA）。以下代码再次实例化一个图表小部件，这次是带有三个scatter对象。从样本刻度数据服务器收集的刻度数据存储在pandas的DataFrame对象中。在每次来自套接字的更新后计算两个 SMA。修改后的数据集用于更新图表小部件的data对象（参见图 7-2）：

In [10]: fig = go.FigureWidget() fig.add_scatter(name='SYMBOL') fig.add_scatter(name='SMA1', line=dict(width=1, dash='dot'), mode='lines+markers') fig.add_scatter(name='SMA2', line=dict(width=1, dash='dash'), mode='lines+markers') figOut[10]: FigureWidget({ 'data': [{'name': 'SYMBOL', 'type': 'scatter', 'uid': 'bcf83157-f015-411b-a834-d5fd6ac509ba…In [11]: import pandas as pdIn [12]: df = pd.DataFrame() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [13]: for _ in range(75): msg = socket.recv_string() t = datetime.now() sym, price = msg.split() df = df.append(pd.DataFrame({sym: float(price)}, index=[t])) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) df['SMA1'] = df[sym].rolling(5).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) df['SMA2'] = df[sym].rolling(10).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) fig.data[0].x = df.index fig.data[1].x = df.index fig.data[2].x = df.index fig.data[0].y = df[sym] fig.data[1].y = df['SMA1'] fig.data[2].y = df['SMA2']

在 DataFrame 对象中收集 tick 数据。

将两个简单移动平均线添加到 DataFrame 对象的单独列中。

再次，将实时 tick 数据的绘制与两个简单移动平均线的实现结合起来，实现基于这两个简单移动平均线的在线交易算法是一种很好的练习。在这种情况下，应添加重采样到实现中，因为这种交易算法很少基于 tick 数据，而是基于固定长度的条形图（五秒钟、一分钟等）。

Figure 7-2. 实时计算的流式价格数据和两个简单移动平均线的图表

三个流的三个子图

与常规的 Plotly 图表一样，基于 figure 小部件的实时流图表也可以具有多个子图。接下来的示例创建一个具有三个子图的实时流图表。第一个子图绘制实时 tick 数据。第二个子图绘制对数收益数据。第三个子图基于对数收益数据绘制时间序列动量。Figure7-3 展示了整个图形对象的快照：

In [14]: from plotly.subplots import make_subplotsIn [15]: f = make_subplots(rows=3, cols=1, shared_xaxes=True) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) f.append_trace(go.Scatter(name='SYMBOL'), row=1, col=1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) f.append_trace(go.Scatter(name='RETURN', line=dict(width=1, dash='dot'), mode='lines+markers', marker={'symbol': 'triangle-up'}), row=2, col=1) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) f.append_trace(go.Scatter(name='MOMENTUM', line=dict(width=1, dash='dash'), mode='lines+markers', marker={'symbol': 'x'}), row=3, col=1) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) # f.update_layout(height=600) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [16]: fig = go.FigureWidget(f)In [17]: figOut[17]: FigureWidget({ 'data': [{'name': 'SYMBOL', 'type': 'scatter', 'uid': 'c8db0cac…In [18]: import numpy as npIn [19]: df = pd.DataFrame()In [20]: for _ in range(75): msg = socket.recv_string() t = datetime.now() sym, price = msg.split() df = df.append(pd.DataFrame({sym: float(price)}, index=[t])) df['RET'] = np.log(df[sym] / df[sym].shift(1)) df['MOM'] = df['RET'].rolling(10).mean() fig.data[0].x = df.index fig.data[1].x = df.index fig.data[2].x = df.index fig.data[0].y = df[sym] fig.data[1].y = df['RET'] fig.data[2].y = df['MOM']

创建三个共享 x 轴的子图。

为价格数据创建第一个子图。

为对数收益数据创建第二个子图。

为动量数据创建第三个子图。

调整图形对象的高度。

Figure 7-3. 实时流价格数据、对数收益和动量在不同子图中

实时流数据作为条形图

并非所有实时数据都最适合作为时间序列（Scatter 对象）进行可视化。某些实时数据最好通过高度变化的条形图进行可视化。“条形图样本数据服务器” 包含一个适用于基于条形图可视化的样本数据的 Python 脚本。单个数据集（消息）包含八个浮点数。以下 Python 代码生成一个实时条形图（参见 Figure7-4）。在此情况下，x 数据通常不会改变。要使以下代码起作用，需要在单独的本地 Python 实例中执行 BarsServer.py 脚本：

In [21]: socket = context.socket(zmq.SUB)In [22]: socket.connect('tcp://0.0.0.0:5556')In [23]: socket.setsockopt_string(zmq.SUBSCRIBE, '')In [24]: for _ in range(5): msg = socket.recv_string() print(msg) 60.361 53.504 67.782 64.165 35.046 94.227 20.221 54.716 79.508 48.210 84.163 73.430 53.288 38.673 4.962 78.920 53.316 80.139 73.733 55.549 21.015 20.556 49.090 29.630 86.664 93.919 33.762 82.095 3.108 92.122 84.194 36.666 37.192 85.305 48.397 36.903 81.835 98.691 61.818 87.121In [25]: fig = go.FigureWidget() fig.add_bar() figOut[25]: FigureWidget({ 'data': [{'type': 'bar', 'uid': '51c6069f-4924-458d-a1ae-c5b5b5f3b07f'}], 'layout': {'templ…In [26]: x = list('abcdefgh') fig.data[0].x = x for _ in range(25): msg = socket.recv_string() y = msg.split() y = [float(n) for n in y] fig.data[0].y = y

Figure 7-4. 实时流数据作为高度变化的条形图

如今，算法交易必须处理不同类型的流数据。在这方面最重要的是金融工具的 Tick 数据，原则上是全天候生成和发布的。² 套接字是处理流数据的技术工具首选。在这方面一个强大且易于使用的库是ZeroMQ，本章中使用它创建一个简单的 Tick 数据服务器，无休止地发出样本 Tick 数据。

引入并解释了不同的 Tick 数据客户端，以基于在线算法生成实时交易信号，并使用 Plotly 通过流动图表可视化传入的 Tick 数据。Plotly 使得在 Jupyter Notebook 中进行流数据可视化变得高效，允许在单个图表或不同子图中进行多个流的显示。

根据本章和前几章讨论的主题，您现在能够处理历史结构化数据（例如在交易策略回测的背景下）和实时流数据（例如在实时生成交易信号的背景下）。这代表着建立自动化算法交易操作的重要里程碑。

深入了解ZeroMQ的最佳起点是ZeroMQ 主页。Learning ZeroMQ with Python 教程页面提供了基于 Python 套接字通信库的 PUB-SUB 模式概述。

开始使用 Plotly 的好地方是Plotly 主页，特别是 Python 的Getting Started with Plotly 页面。

本节介绍了本章引用和使用的 Python 脚本。

样本 Tick 数据服务器

下面是一个基于ZeroMQ运行样本 Tick 数据服务器的脚本。它利用蒙特卡罗模拟实现几何布朗运动：

## Python Script to Simulate a# Financial Tick Data Server## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import zmqimport mathimport timeimport randomcontext = zmq.Context()socket = context.socket(zmq.PUB)socket.bind('tcp://0.0.0.0:5555')class InstrumentPrice(object): def __init__(self): self.symbol = 'SYMBOL' self.t = time.time() self.value = 100. self.sigma = 0.4 self.r = 0.01 def simulate_value(self): ''' Generates a new, random stock price. ''' t = time.time() dt = (t - self.t) / (252 * 8 * 60 * 60) dt *= 500 self.t = t self.value *= math.exp((self.r - 0.5 * self.sigma ** 2) * dt + self.sigma * math.sqrt(dt) * random.gauss(0, 1)) return self.valueip = InstrumentPrice()while True: msg = '{} {:.2f}'.format(ip.symbol, ip.simulate_value()) print(msg) socket.send_string(msg) time.sleep(random.random() * 2)

Tick 数据客户端

下面是一个基于ZeroMQ运行 Tick 数据客户端的脚本。它连接到来自“样本 Tick 数据服务器”的 Tick 数据服务器：

## Python Script# with Tick Data Client## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import zmqcontext = zmq.Context()socket = context.socket(zmq.SUB)socket.connect('tcp://0.0.0.0:5555')socket.setsockopt_string(zmq.SUBSCRIBE, 'SYMBOL')while True: data = socket.recv_string() print(data)

动量在线算法

下面是一个脚本，实现基于时间序列动量的交易策略作为在线算法。它连接到来自“样本 Tick 数据服务器”的 Tick 数据服务器：

## Python Script# with Online Trading Algorithm## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import zmqimport datetimeimport numpy as npimport pandas as pdcontext = zmq.Context()socket = context.socket(zmq.SUB)socket.connect('tcp://0.0.0.0:5555')socket.setsockopt_string(zmq.SUBSCRIBE, 'SYMBOL')df = pd.DataFrame()mom = 3min_length = mom + 1while True: data = socket.recv_string() t = datetime.datetime.now() sym, value = data.split() df = df.append(pd.DataFrame({sym: float(value)}, index=[t])) dr = df.resample('5s', label='right').last() dr['returns'] = np.log(dr / dr.shift(1)) if len(dr) > min_length: min_length += 1 dr['momentum'] = np.sign(dr['returns'].rolling(mom).mean()) print('\n' + '=' * 51) print('NEW SIGNAL | {}'.format(datetime.datetime.now())) print('=' * 51) print(dr.iloc[:-1].tail()) if dr['momentum'].iloc[-2] == 1.0: print('\nLong market position.') # take some action (e.g., place buy order) elif dr['momentum'].iloc[-2] == -1.0: print('\nShort market position.') # take some action (e.g., place sell order)

条形图样本数据服务器

下面是一个生成流动条形图样本数据的 Python 脚本：

## Python Script to Serve# Random Bars Data## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import zmqimport mathimport timeimport randomcontext = zmq.Context()socket = context.socket(zmq.PUB)socket.bind('tcp://0.0.0.0:5556')while True: bars = [random.random() * 100 for _ in range(8)] msg = ' '.join([f'{bar:.3f}' for bar in bars]) print(msg) socket.send_string(msg) time.sleep(random.random() * 2)

¹ 当提到同时或同时进行时，这是在理论上和理想化的意义上说的。在实际应用中，发送和接收套接字之间的不同距离、网络速度和其他因素影响每个订阅套接字的确切检索时间。

² 并非所有市场都是全天候、全周 7 天开放的，当然，也并非所有金融工具都是全天候交易的。然而，例如比特币等加密货币市场的确是全天候运营的，不断产生新数据，需要由参与这些市场的活跃参与者实时消化。

今天，即使是交易复杂工具或获得足够杠杆的小实体也可能威胁全球金融系统。
保罗·辛格

今天，开始在金融市场进行交易比以往任何时候都更容易。有大量在线交易平台（经纪商）可供算法交易者选择。选择平台可能受多种因素影响：

工具

第一个想到的标准是所感兴趣的交易工具类型。例如，一个人可能有兴趣交易股票、交易所交易基金（ETFs）、债券、货币、大宗商品、期权或期货。

策略

一些交易者对仅限多头策略感兴趣，而其他人则需要做空。一些人专注于单一工具策略，而其他人则专注于同时涉及多种工具的策略。

成本

固定和变动交易成本对许多交易者来说是一个重要因素。它们甚至可能决定某种策略是否盈利（例如，参见第四章和第六章）。

技术

技术已经成为选择交易平台的重要因素。首先是平台为交易者提供的工具。通常，交易工具可用于桌面/笔记本电脑、平板电脑和智能手机。其次是交易者可以以编程方式访问的应用程序编程接口（APIs）。

管辖权

金融交易是一个受严格监管的领域，不同国家或地区有不同的法律框架。这可能会根据其居住地禁止某些交易者使用某些平台和/或金融工具。

本章重点介绍Oanda，这是一个在线交易平台，非常适合部署自动化、算法交易策略，即使是零售交易者也能轻松应用。以下是对 Oanda 的简要描述，根据之前概述的标准：

工具

Oanda 提供各种所谓的差价合约（CFD）产品（参见“差价合约 (CFDs)”和“免责声明”）。CFD 的主要特点是可以使用杠杆（例如 10:1 或 50:1），并且交易保证金化，因此亏损可能超过初始资本。

策略

Oanda 允许同时买入和卖出 CFD。提供不同类型的订单，如市价或限价订单，带有或不带有利润目标和/或（追踪）止损。

成本

在 Oanda 交易 CFD 时，没有固定的交易成本。然而，存在买卖价差，导致交易 CFD 时的变动交易成本。

技术

Oanda 提供了交易应用程序 fxTrade（Practice），可以实时检索数据并允许（手动、自主）交易所有工具（见图 8-1）。此外，还提供了基于浏览器的交易应用程序（见图 8-2）。该平台的主要优势在于其 RESTful 和流式 API（参见 Oanda v20 API），通过这些 API，交易员可以编程访问历史数据和实时数据，下买卖订单或检索账户信息。还提供了一个 Python 封装包（见 v20 on PyPi）。Oanda 提供了免费的模拟交易账户，可以完全使用所有技术功能，这在平台入门时非常有帮助。这也简化了从模拟交易到真实交易的过渡。

司法管辖区

根据账户持有人的居住地，可以交易的 CFD 选择会有所不同。例如，与外汇相关的 CFD 在 Oanda 活跃的地方基本上都可以交易。而例如股指的 CFD，在某些司法管辖区可能不可用。

图 8-1. Oanda 交易应用 fxTrade Practice

图 8-2. Oanda 基于浏览器的交易应用程序

本章内容安排如下。“设置账户” 简要介绍了如何设置账户。“Oanda API” 说明了访问 API 的必要步骤。基于 API 访问，“检索历史数据” 获取并处理特定 CFD 的历史数据。“处理流数据” 介绍了 Oanda 的流式 API，用于数据检索和可视化。“实时实施交易策略” 实现了实时自动化算法交易策略。最后，“检索账户信息” 处理有关账户本身的数据，例如当前余额或最近的交易。整个过程中，代码使用了名为 tpqoa 的 Python 封装类（见 GitHub 仓库）。

本章的目标是利用前几章介绍的方法和技术，自动在 Oanda 平台上进行交易。

使用 Oanda 注册账户的过程简单而高效。你可以选择实盘账户或免费的模拟（“练习”）账户，后者完全足以实施后续内容（见图 8-3 和 8-4）。

图 8-3. Oanda 账户注册（账户类型）

如果注册成功并且在平台上登录到帐户，您应该会看到一个起始页面，如图8-5 所示。在中间，您会找到一个下载链接，用于安装fxTrade Practice for Desktop应用程序。一旦运行起来，它看起来类似于图8-1 中显示的屏幕截图。

图 8-4. Oanda 帐户注册（注册表单）

图 8-5. Oanda 帐户起始页面

注册后，轻松获取 Oanda 的 API 访问权限。所需的主要要素是帐号和访问令牌（API 密钥）。例如，您可以在“管理资金”区域找到帐号号码。访问令牌可以在“管理 API 访问”区域生成（见图8-6）。¹

从现在开始，使用configparser模块来管理帐户凭证。该模块期望一个文本文件——例如命名为pyalgo.cfg——的以下格式，以便与 Oanda 实践帐户一起使用：

[oanda]account_id = YOUR_ACCOUNT_IDaccess_token = YOUR_ACCESS_TOKENaccount_type = practice

图 8-6. Oanda API 访问管理页面

要通过 Python 访问 API，建议使用 Python 封装包tpqoa（参见GitHub 代码库），它依赖于 Oanda 的v20包（参见GitHub 代码库）。

安装它的命令如下：

pip install git+https://github.com/yhilpisch/tpqoa.git

有了这些先决条件，您可以通过一行代码连接到 API：

In [1]: import tpqoaIn [2]: api = tpqoa.tpqoa('../pyalgo.cfg') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)

如有需要，请调整路径和文件名。

这是一个重要的里程碑：连接到 Oanda API 允许检索历史数据、程序化下单等功能。

使用configparser模块的好处在于它简化了帐户凭证的存储和管理。在算法交易中，所需的账户数量可能会迅速增加。例如，云实例或服务器、数据服务提供商、在线交易平台等。

缺点是帐户信息以纯文本形式存储，这代表着一个相当大的安全风险，特别是因为关于多个账户的信息存储在一个文件中。因此，在投入生产之前，您应该应用文件加密方法来确保凭证的安全性。

使用 Oanda 平台的一个主要好处是，可以通过 RESTful API 访问所有 Oanda 工具的完整价格历史。在这个上下文中，完整历史指的是 CFD 本身的不同，而不是它们所定义的基础工具。

查找可交易的工具

要查看特定账户可交易的工具概览，请使用.get_instruments()方法。它仅检索 API 中的显示名称和技术工具名称。更多详细信息可以通过 API 获取，例如最小持仓大小：

In [3]: api.get_instruments()[:15]Out[3]: [('AUD/CAD', 'AUD_CAD'), ('AUD/CHF', 'AUD_CHF'), ('AUD/HKD', 'AUD_HKD'), ('AUD/JPY', 'AUD_JPY'), ('AUD/NZD', 'AUD_NZD'), ('AUD/SGD', 'AUD_SGD'), ('AUD/USD', 'AUD_USD'), ('Australia 200', 'AU200_AUD'), ('Brent Crude Oil', 'BCO_USD'), ('Bund', 'DE10YB_EUR'), ('CAD/CHF', 'CAD_CHF'), ('CAD/HKD', 'CAD_HKD'), ('CAD/JPY', 'CAD_JPY'), ('CAD/SGD', 'CAD_SGD'), ('CHF/HKD', 'CHF_HKD')]

在分钟 K 线上回测动量策略

以下示例使用基于 EUR/USD 货币对的工具EUR_USD。目标是在一分钟 K 线上回测基于动量的策略。使用的数据为 2020 年 5 月的两天。第一步是从 Oanda 检索原始数据：

In [4]: help(api.get_history) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) Help on method get_history in module tpqoa.tpqoa: get_history(instrument, start, end, granularity, price, localize=True) method of tpqoa.tpqoa.tpqoa instance Retrieves historical data for instrument. Parameters ========== instrument: string valid instrument name start, end: datetime, str Python datetime or string objects for start and end granularity: string a string like 'S5', 'M1' or 'D' price: string one of 'A' (ask), 'B' (bid) or 'M' (middle) Returns ======= data: pd.DataFrame pandas DataFrame object with dataIn [5]: instrument = 'EUR_USD' ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) start = '2020-08-10' ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) end = '2020-08-12' ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) granularity = 'M1' ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) price = 'M' ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [6]: data = api.get_history(instrument, start, end, granularity, price) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [7]: data.info() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2814 entries, 2020-08-10 00:00:00 to 2020-08-11 23:59:00 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 o 2814 non-null float64 1 h 2814 non-null float64 2 l 2814 non-null float64 3 c 2814 non-null float64 4 volume 2814 non-null int64 5 complete 2814 non-null bool dtypes: bool(1), float64(4), int64(1) memory usage: 134.7 KBIn [8]: data[['c', 'volume']].head() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[8]: c volume time 2020-08-10 00:00:00 1.17822 18 2020-08-10 00:01:00 1.17836 32 2020-08-10 00:02:00 1.17828 25 2020-08-10 00:03:00 1.17834 13 2020-08-10 00:04:00 1.17847 43

显示.get_history()方法的文档字符串（帮助文本）。

定义参数值。

从 API 中检索原始数据。

显示检索到的数据集的元信息。

显示两列的前五行数据。

第二步是实施向量化回测。其想法是同时回测几种动量策略。代码简单而简洁（另见第四章）。

为简单起见，以下代码仅使用中间价格的收盘（c）值：²

In [9]: import numpy as npIn [10]: data['returns'] = np.log(data['c'] / data['c'].shift(1)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [11]: cols = [] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [12]: for momentum in [15, 30, 60, 120]: ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) col = 'position_{}'.format(momentum) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) data[col] = np.sign(data['returns'].rolling(momentum).mean()) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) cols.append(col) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)

基于中间价格的close值计算对数收益。

实例化一个空list对象来收集列名。

定义动量策略的分钟 K 线时间间隔。

定义要在DataFrame对象中存储的列名。

将策略定位添加为新列。

向list对象附加列名。

最后一步是推导和绘制不同动量策略的绝对表现。图图 8-7 以图形方式显示了基于动量的策略的表现，并将其与基础工具的表现进行比较：

In [13]: from pylab import plt plt.style.use('seaborn') import matplotlib as mpl mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'serif'In [14]: strats = ['returns'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [15]: for col in cols: ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) strat = 'strategy_{}'.format(col.split('_')[1]) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) data[strat] = data[col].shift(1) * data['returns'] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) strats.append(strat) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [16]: data[strats].dropna().c*msum( ).apply(np.exp).plot(figsize=(10, 6)); ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)

定义另一个list对象以存储稍后绘制的列名。

遍历具有不同策略定位的列。

推导策略表现存储的新列名称。

为不同策略计算对数收益并将其存储为新列。

将列名附加到list对象，以备后续绘图使用。

绘制工具和策略的累积表现。

图8-7显示了`EUR_USD`工具（分钟柱）不同动量策略的总体表现。

考虑杠杆和保证金因素

一般而言，当你购买一支股票，例如 100 美元时，盈亏（P&L）计算非常直观：如果股票价格上涨 1 美元，你赚取 1 美元（未实现盈利）；如果股票价格下跌 1 美元，你亏损 1 美元（未实现损失）。如果你购买了 10 股，只需将结果乘以 10。

在 Oanda 平台上交易差价合约（CFD）涉及杠杆和保证金，这对 P&L 计算有显著影响。有关此主题的介绍和概述，请参阅Oanda fxTrade Margin Rules。一个简单的例子可以在这个背景下阐明主要方面。

假设一位以欧元为基础的算法交易员希望在 Oanda 平台上交易EUR_USD工具，并且希望以 1.1 的卖出价格获得 10,000 欧元的多头敞口。如果没有杠杆和保证金，交易员（或 Python 程序）将购买 10,000 单位的 CFD。³如果工具（汇率）价格上涨至 1.105（作为买卖价格间的中间率），绝对利润为 10,000 x 0.005 = 50 或 0.5%。

杠杆和保证金会带来什么影响？假设算法交易员选择 20:1 的杠杆比率，这意味着 5%的保证金（= 100% / 20）。这意味着交易员只需提前支付 10,000 欧元 x 5% = 500 欧元的保证金，即可获得相同的敞口。如果随后工具价格上涨至 1.105，绝对利润保持在 50 欧元不变，但相对利润却提高到 50 欧元 / 500 欧元 = 10%。收益因杠杆而显著放大，这是当事情如期望般进行时的好处。

如果情况变糟，会发生什么？假设工具价格跌至 1.08（作为买卖价格间的中间率），导致损失为 10,000 x (1.08 - 1.1) = -200 欧元。现在的相对损失为-200 欧元 / 500 欧元 = -40%。如果算法交易员所用账户的权益/现金少于 200 欧元，那么由于无法满足（监管）保证金要求，必须平仓。如果损失完全吞噬了保证金，需要额外的资金作为保证金来维持交易的持续。

图8-8展示了杠杆比率为 20:1 时动量策略对绩效的放大效应。5%的初始保证金足以覆盖潜在的损失，即使在最坏的情况下也未被耗尽：

In [17]: data[strats].dropna().c*msum().apply( lambda x: x * 20).apply(np.exp).plot(figsize=(10, 6)); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)

根据假设的杠杆比率，将对数收益率乘以 20 的因子。

杠杆交易不仅放大了潜在的利润，也放大了潜在的损失。基于 10:1 的杠杆交易（10% 保证金），基础工具的 10% 不利变动已经会清空整个保证金。换句话说，10% 的变动导致 100% 的损失。因此，您应该确保充分理解杠杆交易涉及的所有风险。您还应该确保采取适当的风险措施，例如符合您风险偏好和风险配置的止损订单。

图 8-8. `EUR_USD` 工具动量策略的总体表现，使用 20:1 杠杆（分钟柱状图）

使用 Python 封装包 tpqoa 再次简化和直接处理流数据。该包与 v20 包结合使用，处理套接字通信，使算法交易员只需决定如何处理流数据：

In [18]: instrument = 'EUR_USD'In [19]: api.stream_data(instrument, stop=10) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 2020-08-19T14:39:13.560138152Z 1.19131 1.1915 2020-08-19T14:39:14.088511060Z 1.19134 1.19152 2020-08-19T14:39:14.390081879Z 1.19124 1.19145 2020-08-19T14:39:15.105974700Z 1.19129 1.19144 2020-08-19T14:39:15.375370451Z 1.19128 1.19144 2020-08-19T14:39:15.501380756Z 1.1912 1.19141 2020-08-19T14:39:15.951793928Z 1.1912 1.19138 2020-08-19T14:39:16.354844135Z 1.19123 1.19138 2020-08-19T14:39:16.661440356Z 1.19118 1.19133 2020-08-19T14:39:16.912150908Z 1.19112 1.19132

stop 参数在检索到一定数量的 ticks 后停止流式传输。

同样，使用 create_order() 方法可以简单地下市场买入或卖出订单：

In [20]: help(api.create_order) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) Help on method create_order in module tpqoa.tpqoa: create_order(instrument, units, price=None, sl_distance=None, tsl_distance=None, tp_price=None, comment=None, touch=False, suppress=False, ret=False) method of tpqoa.tpqoa.tpqoa instance Places order with Oanda. Parameters ========== instrument: string valid instrument name units: int number of units of instrument to be bought (positive int, e.g., 'units=50') or to be sold (negative int, e.g., 'units=-100') price: float limit order price, touch order price sl_distance: float stop loss distance price, mandatory e.g., in Germany tsl_distance: float trailing stop loss distance tp_price: float take profit price to be used for the trade comment: str string touch: boolean market_if_touched order (requires price to be set) suppress: boolean whether to suppress print out ret: boolean whether to return the order objectIn [21]: api.create_order(instrument, 1000) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) {'id': '1721', 'time': '2020-08-19T14:39:17.062399275Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1720', 'requestID': '24716258589170956', 'type': 'ORDER_FILL', 'orderID': '1720', 'instrument': 'EUR_USD', 'units': '1000.0', 'gainQuoteHomeConversionFactor': '0.835288642787', 'lossQuoteHomeConversionFactor': '0.843683503518', 'price': 1.19131, 'fullVWAP': 1.19131, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.1911, 'liquidity': '10000000'}], 'asks': [{'price': 1.19131, 'liquidity': '10000000'}], 'closeoutBid': 1.1911, 'closeoutAsk': 1.19131}, 'reason': 'MARKET_ORDER', 'pl': '0.0', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98510.7986', 'tradeOpened': {'tradeID': '1721', 'units': '1000.0', 'price': 1.19131, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.0881', 'initialMarginRequired': '33.3'}, 'halfSpreadCost': '0.0881'}In [22]: api.create_order(instrument, -1500) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) {'id': '1723', 'time': '2020-08-19T14:39:17.200434462Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1722', 'requestID': '24716258589171315', 'type': 'ORDER_FILL', 'orderID': '1722', 'instrument': 'EUR_USD', 'units': '-1500.0', 'gainQuoteHomeConversionFactor': '0.835288642787', 'lossQuoteHomeConversionFactor': '0.843683503518', 'price': 1.1911, 'fullVWAP': 1.1911, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.1911, 'liquidity': '10000000'}], 'asks': [{'price': 1.19131, 'liquidity': '9999000'}], 'closeoutBid': 1.1911, 'closeoutAsk': 1.19131}, 'reason': 'MARKET_ORDER', 'pl': '-0.1772', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98510.6214', 'tradeOpened': {'tradeID': '1723', 'units': '-500.0', 'price': 1.1911, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.0441', 'initialMarginRequired': '16.65'}, 'tradesClosed': [{'tradeID': '1721', 'units': '-1000.0', 'price': 1.1911, 'realizedPL': '-0.1772', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.0881'}], 'halfSpreadCost': '0.1322'}In [23]: api.create_order(instrument, 500) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) {'id': '1725', 'time': '2020-08-19T14:39:17.348231507Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1724', 'requestID': '24716258589171775', 'type': 'ORDER_FILL', 'orderID': '1724', 'instrument': 'EUR_USD', 'units': '500.0', 'gainQuoteHomeConversionFactor': '0.835313189428', 'lossQuoteHomeConversionFactor': '0.84370829686', 'price': 1.1913, 'fullVWAP': 1.1913, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19104, 'liquidity': '9998500'}], 'asks': [{'price': 1.1913, 'liquidity': '9999000'}], 'closeoutBid': 1.19104, 'closeoutAsk': 1.1913}, 'reason': 'MARKET_ORDER', 'pl': '-0.0844', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98510.537', 'tradesClosed': [{'tradeID': '1723', 'units': '500.0', 'price': 1.1913, 'realizedPL': '-0.0844', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.0546'}], 'halfSpreadCost': '0.0546'}

显示了下市场、限价和市场如果触及订单的所有选项。

通过市场订单开设多头头寸。

在通过市场订单平仓多头头寸后进行空头操作。

通过市场订单平仓空头头寸。

虽然 Oanda API 允许下达不同类型的订单，但本章和接下来的章节主要关注市场订单，以便在出现新信号时立即开多或开空。

本节介绍了一个自定义类，根据动量策略在 Oanda 平台上自动交易 EUR_USD 工具。它被称为 MomentumTrader，在 “Python Script” 中展示。以下逐行讲解该类，从 0 方法开始。该类本身继承自 tpqoa 类：

import tpqoaimport numpy as npimport pandas as pdclass MomentumTrader(tpqoa.tpqoa): def __init__(self, conf_file, instrument, bar_length, momentum, units, *args, **kwargs): super(MomentumTrader, self).__init__(conf_file) self.position = 0 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.instrument = instrument ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.momentum = momentum ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.bar_length = bar_length ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.units = units ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.raw_data = pd.DataFrame() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) self.min_length = self.momentum + 1 ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)

初始位置值（市场中性）。

要交易的工具。

重新采样 tick 数据的条的长度。

动量计算的区间数。

要交易的单位数。

一个空的 DataFrame 对象，将用 tick 数据填充。

开始交易本身的初始最小 bar 长度。

主要方法是 .on_success() 方法，用于实现动量策略的交易逻辑：

 def on_success(self, time, bid, ask): ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) ''' Takes actions when new tick data arrives. ''' print(self.ticks, end=' ') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.raw_data = self.raw_data.append(pd.DataFrame( {'bid': bid, 'ask': ask}, index=[pd.Timestamp(time)])) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.data = self.raw_data.resample( self.bar_length, label='right').last().ffill().iloc[:-1] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.data['mid'] = self.data.mean(axis=1) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.data['returns'] = np.log(self.data['mid'] / self.data['mid'].shift(1)) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) self.data['position'] = np.sign( self.data['returns'].rolling(self.momentum).mean()) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) if len(self.data) > self.min_length: ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) self.min_length += 1 ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) if self.data['position'].iloc[-1] == 1: ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) if self.position == 0: ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) self.create_order(self.instrument, self.units) ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png) elif self.position == -1: ![12](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/12.png) self.create_order(self.instrument, self.units * 2) ![13](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/13.png) self.position = 1 ![14](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/14.png) elif self.data['position'].iloc[-1] == -1: ![15](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/15.png) if self.position == 0: ![16](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/16.png) self.create_order(self.instrument, -self.units) ![17](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/17.png) elif self.position == 1: ![18](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/18.png) self.create_order(self.instrument, -self.units * 2) ![19](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/19.png) self.position = -1 ![20](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/20.png)

当新的 tick 数据到达时调用此方法。

打印检索到的 ticks 数量。

收集并存储 tick 数据。

然后将 tick 数据重新采样为适当的 bar 长度。

计算中间价格…

…基于此推导对数收益率。

基于 momentum 参数/属性（通过在线算法）推导信号（定位）。

当有足够或新数据时，应用交易逻辑并每次增加最小长度一次。

检查最新定位（“信号”）是否为 1（多头）。

如果当前市场位置为 0（中性）…

…以 self.units 进行买入订单。

如果是 -1（空头）…

…以 0 进行买入订单。

市场位置 self.position 设置为 +1（多头）。

检查最新定位（“信号”）是否为 -1（空头）。

如果当前市场位置为 0（中性）…

…以 -self.units 进行卖出订单。

如果是 +1（多头）…

…以 0 进行卖出订单。

市场位置 self.position 设置为 -1（空头）。

根据这个类，开始自动化算法交易仅需四行代码。随后的 Python 代码启动了一个自动化交易会话：

In [24]: import MomentumTrader as MTIn [25]: mt = MT.MomentumTrader('../pyalgo.cfg', ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) instrument=instrument, ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) bar_length='10s', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) momentum=6, ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) units=10000) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [26]: mt.stream_data(mt.instrument, stop=500) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)

使用凭据的配置文件。

指定 instrument 参数。

为重新采样提供的 bar_length 参数。

已定义momentum参数，应用于重新采样的数据间隔。

units参数已设置，指定了多头和空头仓位的位置大小。

这将启动流式处理和交易；在 100 个点之后停止。

前述代码提供以下输出：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153{'id': '1727', 'time': '2020-08-19T14:40:30.443867492Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1726', 'requestID': '42730657405829101', 'type': 'ORDER_FILL', 'orderID': '1726', 'instrument': 'EUR_USD', 'units': '10000.0', 'gainQuoteHomeConversionFactor': '0.8350012403', 'lossQuoteHomeConversionFactor': '0.843393212565', 'price': 1.19168, 'fullVWAP': 1.19168, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19155, 'liquidity': '10000000'}], 'asks': [{'price': 1.19168, 'liquidity': '10000000'}], 'closeoutBid': 1.19155, 'closeoutAsk': 1.19168}, 'reason': 'MARKET_ORDER', 'pl': '0.0', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98510.537', 'tradeOpened': {'tradeID': '1727', 'units': '10000.0', 'price': 1.19168, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.5455', 'initialMarginRequired': '333.0'}, 'halfSpreadCost': '0.5455'}154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223{'id': '1729', 'time': '2020-08-19T14:41:11.436438078Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1728', 'requestID': '42730657577912600', 'type': 'ORDER_FILL', 'orderID': '1728', 'instrument': 'EUR_USD', 'units': '-20000.0', 'gainQuoteHomeConversionFactor': '0.83519398913', 'lossQuoteHomeConversionFactor': '0.843587898569', 'price': 1.19124, 'fullVWAP': 1.19124, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19124, 'liquidity': '10000000'}], 'asks': [{'price': 1.19144, 'liquidity': '10000000'}], 'closeoutBid': 1.19124, 'closeoutAsk': 1.19144}, 'reason': 'MARKET_ORDER', 'pl': '-3.7118', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98506.8252', 'tradeOpened': {'tradeID': '1729', 'units': '-10000.0', 'price': 1.19124, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.8394', 'initialMarginRequired': '333.0'}, 'tradesClosed': [{'tradeID': '1727', 'units': '-10000.0', 'price': 1.19124, 'realizedPL': '-3.7118', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.8394'}], 'halfSpreadCost': '1.6788'}224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394{'id': '1731', 'time': '2020-08-19T14:42:20.525804142Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1730', 'requestID': '42730657867512554', 'type': 'ORDER_FILL', 'orderID': '1730', 'instrument': 'EUR_USD', 'units': '20000.0', 'gainQuoteHomeConversionFactor': '0.835400847964', 'lossQuoteHomeConversionFactor': '0.843796836386', 'price': 1.19111, 'fullVWAP': 1.19111, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19098, 'liquidity': '10000000'}], 'asks': [{'price': 1.19111, 'liquidity': '10000000'}], 'closeoutBid': 1.19098, 'closeoutAsk': 1.19111}, 'reason': 'MARKET_ORDER', 'pl': '1.086', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98507.9112', 'tradeOpened': {'tradeID': '1731', 'units': '10000.0', 'price': 1.19111, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.5457', 'initialMarginRequired': '333.0'}, 'tradesClosed': [{'tradeID': '1729', 'units': '10000.0', 'price': 1.19111, 'realizedPL': '1.086', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.5457'}], 'halfSpreadCost': '1.0914'}395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500

最后，关闭最后的持仓：

In [27]: oo = mt.create_order(instrument, units=-mt.position * mt.units, ret=True, suppress=True) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) ooOut[27]: {'id': '1733', 'time': '2020-08-19T14:43:17.107985242Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1732', 'requestID': '42730658106750652', 'type': 'ORDER_FILL', 'orderID': '1732', 'instrument': 'EUR_USD', 'units': '-10000.0', 'gainQuoteHomeConversionFactor': '0.835327206922', 'lossQuoteHomeConversionFactor': '0.843722455232', 'price': 1.19109, 'fullVWAP': 1.19109, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19109, 'liquidity': '10000000'}], 'asks': [{'price': 1.19121, 'liquidity': '10000000'}], 'closeoutBid': 1.19109, 'closeoutAsk': 1.19121}, 'reason': 'MARKET_ORDER', 'pl': '-0.1687', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98507.7425', 'tradesClosed': [{'tradeID': '1731', 'units': '-10000.0', 'price': 1.19109, 'realizedPL': '-0.1687', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.5037'}], 'halfSpreadCost': '0.5037'}

结束最后的持仓。

关于账户信息、交易历史等方面，Oanda 的 RESTful API 也很方便。例如，在前一节执行动量策略后，算法交易员可能想要查看交易账户的当前余额。这可以通过.get_account_summary()方法实现：

In [28]: api.get_account_summary()Out[28]: {'id': '101-004-13834683-001', 'alias': 'Primary', 'currency': 'EUR', 'balance': '98507.7425', 'createdByUserID': 13834683, 'createdTime': '2020-03-19T06:08:14.363139403Z', 'guaranteedStopLossOrderMode': 'DISABLED', 'pl': '-1273.126', 'resettablePL': '-1273.126', 'resettablePLTime': '0', 'financing': '-219.1315', 'commission': '0.0', 'guaranteedExecutionFees': '0.0', 'marginRate': '0.0333', 'openTradeCount': 1, 'openPositionCount': 1, 'pendingOrderCount': 0, 'hedgingEnabled': False, 'unrealizedPL': '929.8862', 'NAV': '99437.6287', 'marginUsed': '377.76', 'marginAvailable': '99064.4945', 'positionValue': '3777.6', 'marginCloseoutUnrealizedPL': '935.8183', 'marginCloseoutNAV': '99443.5608', 'marginCloseoutMarginUsed': '377.76', 'marginCloseoutPercent': '0.0019', 'marginCloseoutPositionValue': '3777.6', 'withdrawalLimit': '98507.7425', 'marginCallMarginUsed': '377.76', 'marginCallPercent': '0.0038', 'lastTransactionID': '1733'}

使用.get_transactions()方法接收关于最近几笔交易的信息：

In [29]: api.get_transactions(tid=int(oo['id']) - 2)Out[29]: [{'id': '1732', 'time': '2020-08-19T14:43:17.107985242Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1732', 'requestID': '42730658106750652', 'type': 'MARKET_ORDER', 'instrument': 'EUR_USD', 'units': '-10000.0', 'timeInForce': 'FOK', 'positionFill': 'DEFAULT', 'reason': 'CLIENT_ORDER'}, {'id': '1733', 'time': '2020-08-19T14:43:17.107985242Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1732', 'requestID': '42730658106750652', 'type': 'ORDER_FILL', 'orderID': '1732', 'instrument': 'EUR_USD', 'units': '-10000.0', 'gainQuoteHomeConversionFactor': '0.835327206922', 'lossQuoteHomeConversionFactor': '0.843722455232', 'price': 1.19109, 'fullVWAP': 1.19109, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19109, 'liquidity': '10000000'}], 'asks': [{'price': 1.19121, 'liquidity': '10000000'}], 'closeoutBid': 1.19109, 'closeoutAsk': 1.19121}, 'reason': 'MARKET_ORDER', 'pl': '-0.1687', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98507.7425', 'tradesClosed': [{'tradeID': '1731', 'units': '-10000.0', 'price': 1.19109, 'realizedPL': '-0.1687', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '0.5037'}], 'halfSpreadCost': '0.5037'}]

为了简明地概述，还有.print_transactions()方法可用：

In [30]: api.print_transactions(tid=int(oo['id']) - 18) 1717 | 2020-08-19T14:37:00.803426931Z | EUR_USD | -10000.0 | 0.0 1719 | 2020-08-19T14:38:21.953399006Z | EUR_USD | 10000.0 | 6.8444 1721 | 2020-08-19T14:39:17.062399275Z | EUR_USD | 1000.0 | 0.0 1723 | 2020-08-19T14:39:17.200434462Z | EUR_USD | -1500.0 | -0.1772 1725 | 2020-08-19T14:39:17.348231507Z | EUR_USD | 500.0 | -0.0844 1727 | 2020-08-19T14:40:30.443867492Z | EUR_USD | 10000.0 | 0.0 1729 | 2020-08-19T14:41:11.436438078Z | EUR_USD | -20000.0 | -3.7118 1731 | 2020-08-19T14:42:20.525804142Z | EUR_USD | 20000.0 | 1.086 1733 | 2020-08-19T14:43:17.107985242Z | EUR_USD | -10000.0 | -0.1687

Oanda 平台允许轻松直接地进入自动化、算法交易的世界。Oanda 专注于所谓的差价合约（CFD）。根据交易者的居住国家，可以交易多种多样的工具。

从技术角度来看，Oanda 的一个主要优势是现代、强大的 API，可以通过专用的 Python 包（v20）轻松访问。本章介绍了如何设置账户，如何使用 Python 连接 API，如何检索历史数据（一分钟 K 线）进行回测，如何实时检索流式数据，如何基于动量策略自动交易 CFD，以及如何检索账户信息和详细的交易历史。

访问 Oanda 的帮助和支持页面帮助和支持以了解有关 Oanda 平台和 CFD 交易的重要方面的更多信息。

Oanda 的开发者门户入门指南提供了 API 的详细描述。

以下 Python 脚本包含一个 Oanda 自定义流式处理类，可自动执行动量策略交易：

## Python Script# with Momentum Trading Class# for Oanda v20## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch# The Python Quants GmbH#import tpqoaimport numpy as npimport pandas as pdclass MomentumTrader(tpqoa.tpqoa): def __init__(self, conf_file, instrument, bar_length, momentum, units, *args, **kwargs): super(MomentumTrader, self).__init__(conf_file) self.position = 0 self.instrument = instrument self.momentum = momentum self.bar_length = bar_length self.units = units self.raw_data = pd.DataFrame() self.min_length = self.momentum + 1 def on_success(self, time, bid, ask): ''' Takes actions when new tick data arrives. ''' print(self.ticks, end=' ') self.raw_data = self.raw_data.append(pd.DataFrame( {'bid': bid, 'ask': ask}, index=[pd.Timestamp(time)])) self.data = self.raw_data.resample( self.bar_length, label='right').last().ffill().iloc[:-1] self.data['mid'] = self.data.mean(axis=1) self.data['returns'] = np.log(self.data['mid'] / self.data['mid'].shift(1)) self.data['position'] = np.sign( self.data['returns'].rolling(self.momentum).mean()) if len(self.data) > self.min_length: self.min_length += 1 if self.data['position'].iloc[-1] == 1: if self.position == 0: self.create_order(self.instrument, self.units) elif self.position == -1: self.create_order(self.instrument, self.units * 2) self.position = 1 elif self.data['position'].iloc[-1] == -1: if self.position == 0: self.create_order(self.instrument, -self.units) elif self.position == 1: self.create_order(self.instrument, -self.units * 2) self.position = -1if __name__ == '__main__': strat = 2 if strat == 1: mom = MomentumTrader('../pyalgo.cfg', 'DE30_EUR', '5s', 3, 1) mom.stream_data(mom.instrument, stop=100) mom.create_order(mom.instrument, units=-mom.position * mom.units) elif strat == 2: mom = MomentumTrader('../pyalgo.cfg', instrument='EUR_USD', bar_length='5s', momentum=6, units=100000) mom.stream_data(mom.instrument, stop=100) mom.create_order(mom.instrument, units=-mom.position * mom.units) else: print('Strategy not known.')

¹ 在 Oanda API 的上下文中，某些对象的命名并不完全一致。例如，API 密钥和访问令牌可以互换使用。此外，账户 ID和账户号码指的是同一个数字。

² 这隐含地忽略了卖出和买入工具单位时的交易成本，即买卖价差。

³ 请注意，对于某些工具，一个单位表示 1 美元，例如与货币相关的差价合约。对于其他工具，例如与指数相关的差价合约（例如，DE30_EUR），一个单位表示以差价合约的（买入/卖出）价格进行的货币敞口（例如，11,750 欧元）。

⁴ 简化的计算忽略了例如可能因杠杆交易而产生的融资成本。

金融机构喜欢称他们所做的事情为交易。让我们诚实一点。这不是交易；这是打赌。
Graydon Carter

本章介绍了 FXCM 集团的交易平台，其具有 RESTful 和流式应用程序编程接口（API）以及 Python 包fcxmpy。与 Oanda 类似，它非常适合部署自动化的算法交易策略，即使是资金较小的零售交易者。FXCM 为零售和机构交易者提供了许多金融产品，这些产品既可以通过传统交易应用程序进行交易，也可以通过他们的 API 以编程方式进行交易。产品的重点在于货币对以及差价合约（CFD），包括主要股票指数和商品。在这种情况下，还请参阅“差价合约（CFD）”和“免责声明”。

关于如第八章中讨论的平台标准，FXCM 提供以下内容：

工具

外汇产品（例如，货币对的交易），股票指数的差价合约（CFDs），商品或利率产品。

策略

FXCM 允许（以及其他事项）开设（有杠杆）多头和空头头寸，市场进入订单，止损订单和获利目标。

成本

除了买卖价差之外，通常还需支付固定费用来进行 FXCM 的每笔交易。有不同的定价模型可供选择。

技术

FXCM 为算法交易者提供了现代的 RESTful API，可以通过例如使用 Python 包fxcmpy来访问。也提供了桌面计算机，平板电脑和智能手机的标准交易应用程序。

司法管辖权

FXCM 在全球多个国家活跃（例如，英国或德国）。根据国家本身的情况，由于监管和限制，某些产品可能无法提供/提供。

本章介绍了 FXCM 交易 API 和实施自动化算法交易策略所需的fxcmpy Python 包的基本功能。它结构如下。“入门”展示了如何设置一切以使用 FXCM REST API 进行算法交易。“检索数据”展示了如何检索和处理金融数据（直至 tick 级别）。“使用 API”是核心，因为它说明了使用 RESTful API 实现的典型任务，例如检索历史和流数据，下订单或查找帐户信息。

FXCM API 的详细文档可在https://oreil.ly/Df_7e下找到。要安装 Python 包fxcmpy，请在 Shell 上执行以下操作：

pip install fxcmpy

fxcmpy包的文档可在http://fxcmpy.tpq.io下找到。

要开始使用 FXCM 交易 API 和 fxcmpy 包，FXCM 的免费演示账户就足够了。可以在 FXCM 演示账户下开设此类账户。¹ 接下来的步骤是在演示账户内创建一个唯一的 API 令牌（例如 YOUR_FXCM_API_TOKEN）。然后可以通过以下方式打开 API 连接：

import fxcmpyapi = fxcmpy.fxcmpy(access_token=YOUR_FXCM_API_TOKEN, log_level='error')

或者，可以使用在第八章中创建的配置文件连接 API。此文件的内容应修改如下：

[FXCM]log_level = errorlog_file = PATH_TO_AND_NAME_OF_LOG_FILEaccess_token = YOUR_FXCM_API_TOKEN

然后可以通过以下方式连接到 API：

import fxcmpyapi = fxcmpy.fxcmpy(config_file='pyalgo.cfg')

默认情况下，服务器连接到演示服务器。但是，通过使用 server 参数，可以将连接切换到实时交易服务器（如果有此类账户）：

api = fxcmpy.fxcmpy(config_file='pyalgo.cfg', server='demo') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)api = fxcmpy.fxcmpy(config_file='pyalgo.cfg', server='real') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)

连接到演示服务器。

连接到实时交易服务器。

FXCM 提供访问历史市场价格数据集的方式，例如 ticks 数据，以预打包的方式提供。这意味着可以从 FXCM 服务器检索到压缩文件，其中包含 2020 年第 10 周的 EUR/USD 汇率的 ticks 数据。接下来的部分将解释如何从 API 检索历史蜡烛数据。

检索 Tick 数据

对于多个货币对，FXCM 提供历史 ticks 数据。fxcmpy 包使得检索和处理这类 ticks 数据变得方便。首先，导入一些库：

In [1]: import time import numpy as np import pandas as pd import datetime as dt from pylab import mpl, plt plt.style.use('seaborn') mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'serif'

其次，查看可用的 symbols（货币对），这些货币对具有 ticks 数据可用性：

In [2]: from fxcmpy import fxcmpy_tick_data_reader as tdrIn [3]: print(tdr.get_available_symbols()) ('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

以下代码检索单个符号一周的 ticks 数据。生成的 pandas DataFrame 对象包含超过 4.5 百万行数据：

In [4]: start = dt.datetime(2020, 3, 25) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) stop = dt.datetime(2020, 3, 30) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [5]: td = tdr('EURUSD', start, stop) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [6]: td.get_raw_data().info() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) <class 'pandas.core.frame.DataFrame'> Index: 4504288 entries, 03/22/2020 21:12:02.256 to 03/27/2020 20:59:00.022 Data columns (total 2 columns): # Column Dtype --- ------ ----- 0 Bid float64 1 Ask float64 dtypes: float64(2) memory usage: 103.1+ MBIn [7]: td.get_data().info() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4504288 entries, 2020-03-22 21:12:02.256000 to 2020-03-27 20:59:00.022000 Data columns (total 2 columns): # Column Dtype --- ------ ----- 0 Bid float64 1 Ask float64 dtypes: float64(2) memory usage: 103.1 MBIn [8]: td.get_data().head()Out[8]: Bid Ask 2020-03-22 21:12:02.256 1.07006 1.07050 2020-03-22 21:12:02.258 1.07002 1.07050 2020-03-22 21:12:02.259 1.07003 1.07033 2020-03-22 21:12:02.653 1.07003 1.07034 2020-03-22 21:12:02.749 1.07000 1.07034

这会检索数据文件，解压缩并将原始数据存储在 DataFrame 对象中（作为结果对象的属性）。

.get_raw_data() 方法返回带有原始数据的 DataFrame 对象，其中索引值仍然是 str 对象。

.get_data() 方法返回一个 DataFrame 对象，其中的索引已转换为 DatetimeIndex。²

由于 ticks 数据存储在 DataFrame 对象中，因此很容易选择数据的子集并在其上执行典型的金融分析任务。Figure9-1 展示了从子集中派生的中间价格及其简单移动平均线（SMA）的绘图：

In [9]: sub = td.get_data(start='2020-03-25 12:00:00', end='2020-03-25 12:15:00') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [10]: sub.head()Out[10]: Bid Ask 2020-03-25 12:00:00.067 1.08109 1.0811 2020-03-25 12:00:00.072 1.08110 1.0811 2020-03-25 12:00:00.074 1.08109 1.0811 2020-03-25 12:00:00.078 1.08111 1.0811 2020-03-25 12:00:00.121 1.08112 1.0811In [11]: sub['Mid'] = sub.mean(axis=1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [12]: sub['SMA'] = sub['Mid'].rolling(1000).mean() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [13]: sub[['Mid', 'SMA']].plot(figsize=(10, 6), lw=1.5);

选择完整数据集的子集。

计算从买入价和卖出价中得出的中间价格。

按照 1,000 个 ticks 的间隔计算 SMA 值。

图9-1. EUR/USD 历史中间 tick 价格和 SMA

检索蜡烛数据

此外，FXCM 还提供对历史蜡烛数据（超出 API 范围）的访问。蜡烛数据是一定时间间隔内的数据（“柱状图”），包含买入和卖出价格的开盘价、最高价、最低价和收盘价。

首先查看提供蜡烛数据的可用符号：

In [14]: from fxcmpy import fxcmpy_candles_data_reader as cdrIn [15]: print(cdr.get_available_symbols()) ('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

第二，数据检索本身。与前面的 tick 数据检索类似。唯一的区别是需要指定period值或柱状图长度（例如，m1表示一分钟，H1表示一小时，D1表示一天）：

In [16]: start = dt.datetime(2020, 4, 1) stop = dt.datetime(2020, 5, 1)In [17]: period = 'H1' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [18]: candles = cdr('EURUSD', start, stop, period)In [19]: data = candles.get_data()In [20]: data.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 600 entries, 2020-03-29 21:00:00 to 2020-05-01 20:00:00 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 BidOpen 600 non-null float64 1 BidHigh 600 non-null float64 2 BidLow 600 non-null float64 3 BidClose 600 non-null float64 4 AskOpen 600 non-null float64 5 AskHigh 600 non-null float64 6 AskLow 600 non-null float64 7 AskClose 600 non-null float64 dtypes: float64(8) memory usage: 42.2 KBIn [21]: data[data.columns[:4]].tail() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[21]: BidOpen BidHigh BidLow BidClose 2020-05-01 16:00:00 1.09976 1.09996 1.09850 1.09874 2020-05-01 17:00:00 1.09874 1.09888 1.09785 1.09818 2020-05-01 18:00:00 1.09818 1.09820 1.09757 1.09766 2020-05-01 19:00:00 1.09766 1.09816 1.09747 1.09793 2020-05-01 20:00:00 1.09793 1.09812 1.09730 1.09788In [22]: data[data.columns[4:]].tail() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[22]: AskOpen AskHigh AskLow AskClose 2020-05-01 16:00:00 1.09980 1.09998 1.09853 1.09876 2020-05-01 17:00:00 1.09876 1.09891 1.09786 1.09818 2020-05-01 18:00:00 1.09818 1.09822 1.09758 1.09768 2020-05-01 19:00:00 1.09768 1.09818 1.09748 1.09795 2020-05-01 20:00:00 1.09795 1.09856 1.09733 1.09841

指定period数值。

开盘价、最高价、最低价和收盘价对应的买入价格。

开盘价、最高价、最低价和收盘价对应的卖出价格。

结束本节，以下是接下来计算中间收盘价、计算两个 SMA 并绘制结果的 Python 代码（参见图9-2）：

In [23]: data['MidClose'] = data[['BidClose', 'AskClose']].mean(axis=1) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [24]: data['SMA1'] = data['MidClose'].rolling(30).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) data['SMA2'] = data['MidClose'].rolling(100).mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [25]: data[['MidClose', 'SMA1', 'SMA2']].plot(figsize=(10, 6));

计算中间收盘价，从买入和卖出的收盘价中计算。

计算两个 SMA：一个较短时间间隔的和一个较长时间间隔的。

图9-2. EUR/USD 历史小时中间收盘价和两个 SMA

虽然前几节从 FXCM 服务器检索了历史 tick 数据和蜡烛数据的预打包数据，但本节展示了如何通过 API 检索历史数据。但是，需要连接到 FXCM API 的连接对象。因此，首先是导入fxcmpy包，连接到 API（基于唯一的 API 令牌），并查看可用的工具。可能有更多的工具可用，与预打包数据集相比：

In [26]: import fxcmpyIn [27]: fxcmpy.__version__Out[27]: '1.2.6'In [28]: api = fxcmpy.fxcmpy(config_file='../pyalgo.cfg') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [29]: instruments = api.get_instruments()In [30]: print(instruments) ['EUR/USD', 'USD/JPY', 'GBP/USD', 'USD/CHF', 'EUR/CHF', 'AUD/USD', 'USD/CAD', 'NZD/USD', 'EUR/GBP', 'EUR/JPY', 'GBP/JPY', 'CHF/JPY', 'GBP/CHF', 'EUR/AUD', 'EUR/CAD', 'AUD/CAD', 'AUD/JPY', 'CAD/JPY', 'NZD/JPY', 'GBP/CAD', 'GBP/NZD', 'GBP/AUD', 'AUD/NZD', 'USD/SEK', 'EUR/SEK', 'EUR/NOK', 'USD/NOK', 'USD/MXN', 'AUD/CHF', 'EUR/NZD', 'USD/ZAR', 'USD/HKD', 'ZAR/JPY', 'USD/TRY', 'EUR/TRY', 'NZD/CHF', 'CAD/CHF', 'NZD/CAD', 'TRY/JPY', 'USD/ILS', 'USD/CNH', 'AUS200', 'ESP35', 'FRA40', 'GER30', 'HKG33', 'JPN225', 'NAS100', 'SPX500', 'UK100', 'US30', 'Copper', 'CHN50', 'EUSTX50', 'USDOLLAR', 'US2000', 'USOil', 'UKOil', 'SOYF', 'NGAS', 'USOilSpot', 'UKOilSpot', 'WHEATF', 'CORNF', 'Bund', 'XAU/USD', 'XAG/USD', 'EMBasket', 'JPYBasket', 'BTC/USD', 'BCH/USD', 'ETH/USD', 'LTC/USD', 'XRP/USD', 'CryptoMajor', 'EOS/USD', 'XLM/USD', 'ESPORTS', 'BIOTECH', 'CANNABIS', 'FAANG', 'CHN.TECH', 'CHN.ECOMM', 'USEquities']

这将连接到 API；调整路径/文件名。

检索历史数据

连接后，通过单个方法调用实现特定时间间隔的数据检索。使用.get_candles()方法时，参数period可以是m1、m5、m15、m30、H1、H2、H3、H4、H6、H8、D1、W1或M1之一。图9-3 显示了EUR/USD工具（货币对）的一分钟柱状图卖出收盘价：

In [31]: candles = api.get_candles('USD/JPY', period='D1', number=10) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [32]: candles[candles.columns[:4]] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[32]: bidopen bidclose bidhigh bidlow date 2020-08-07 21:00:00 105.538 105.898 106.051 105.452 2020-08-09 21:00:00 105.871 105.846 105.871 105.844 2020-08-10 21:00:00 105.846 105.914 106.197 105.702 2020-08-11 21:00:00 105.914 106.466 106.679 105.870 2020-08-12 21:00:00 106.466 106.848 107.009 106.434 2020-08-13 21:00:00 106.848 106.893 107.044 106.560 2020-08-14 21:00:00 106.893 106.535 107.033 106.429 2020-08-17 21:00:00 106.559 105.960 106.648 105.937 2020-08-18 21:00:00 105.960 105.378 106.046 105.277 2020-08-19 21:00:00 105.378 105.528 105.599 105.097In [33]: candles[candles.columns[4:]] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[33]: askopen askclose askhigh asklow tickqty date 2020-08-07 21:00:00 105.557 105.969 106.062 105.484 253759 2020-08-09 21:00:00 105.983 105.952 105.989 105.925 20 2020-08-10 21:00:00 105.952 105.986 106.209 105.715 161841 2020-08-11 21:00:00 105.986 106.541 106.689 105.929 243813 2020-08-12 21:00:00 106.541 106.950 107.022 106.447 248989 2020-08-13 21:00:00 106.950 106.983 107.056 106.572 214735 2020-08-14 21:00:00 106.983 106.646 107.044 106.442 164244 2020-08-17 21:00:00 106.680 106.047 106.711 105.948 163629 2020-08-18 21:00:00 106.047 105.431 106.101 105.290 215574 2020-08-19 21:00:00 105.431 105.542 105.612 105.109 151255In [34]: start = dt.datetime(2019, 1, 1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) end = dt.datetime(2020, 6, 1) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [35]: candles = api.get_candles('EUR/GBP', period='D1', start=start, stop=end) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [36]: candles.info() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 438 entries, 2019-01-02 22:00:00 to 2020-06-01 21:00:00 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 bidopen 438 non-null float64 1 bidclose 438 non-null float64 2 bidhigh 438 non-null float64 3 bidlow 438 non-null float64 4 askopen 438 non-null float64 5 askclose 438 non-null float64 6 askhigh 438 non-null float64 7 asklow 438 non-null float64 8 tickqty 438 non-null int64 dtypes: float64(8), int64(1) memory usage: 34.2 KBIn [37]: candles = api.get_candles('EUR/USD', period='m1', number=250) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [38]: candles['askclose'].plot(figsize=(10, 6))

检索最近的 10 个收盘价。

检索整年的收盘价。

检索可用的最近一分钟柱状图价格。

从 FXCM RESTful API 检索的历史数据可以根据账户的定价模型而变化。特别是，不同定价模型对不同交易者群体提供的平均买卖价差可能更高或更低。

图 9-3. EUR/USD 的历史要价收盘价格（分钟柱）

检索流数据

尽管历史数据对于例如回测算法交易策略很重要，但在交易时间内，部署和自动化算法交易策略需要持续访问实时或流式数据。与 Oanda API 类似，FXCM API 也允许订阅所有工具的实时数据流。因此，fxcmpy包支持此功能，允许用户定义的函数（称为回调函数）处理订阅的实时数据流。

以下 Python 代码展示了这样一个简单的回调函数——它仅打印出检索到的数据集的选定元素，并用它来处理实时检索的数据，订阅所需工具（这里是EUR/USD）：

In [39]: def output(data, dataframe): print('%3d | %s | %s | %6.5f, %6.5f' % (len(dataframe), data['Symbol'], pd.to_datetime(int(data['Updated']), unit='ms'), data['Rates'][0], data['Rates'][1])) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [40]: api.subscribe_market_data('EUR/USD', (output,)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) 2 | EUR/USD | 2020-08-19 14:32:36.204000 | 1.19319, 1.19331 3 | EUR/USD | 2020-08-19 14:32:37.005000 | 1.19320, 1.19331 4 | EUR/USD | 2020-08-19 14:32:37.940000 | 1.19323, 1.19333 5 | EUR/USD | 2020-08-19 14:32:38.429000 | 1.19321, 1.19332 6 | EUR/USD | 2020-08-19 14:32:38.915000 | 1.19323, 1.19334 7 | EUR/USD | 2020-08-19 14:32:39.436000 | 1.19321, 1.19332 8 | EUR/USD | 2020-08-19 14:32:39.883000 | 1.19317, 1.19328 9 | EUR/USD | 2020-08-19 14:32:40.437000 | 1.19317, 1.19328 10 | EUR/USD | 2020-08-19 14:32:40.810000 | 1.19318, 1.19329In [41]: api.get_last_price('EUR/USD') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[41]: Bid 1.19318 Ask 1.19329 High 1.19534 Low 1.19217 Name: 2020-08-19 14:32:40.810000, dtype: float64 11 | EUR/USD | 2020-08-19 14:32:41.410000 | 1.19319, 1.19329In [42]: api.unsubscribe_market_data('EUR/USD') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

这是打印检索到的数据集中某些元素的回调函数。

这是对特定实时数据流的订阅。只要没有“取消订阅”事件，数据将异步处理。

在订阅期间，.get_last_price()方法返回最后一个可用的数据集。

这将取消实时数据流的订阅。

回调函数是根据 Python 函数或甚至多个这样的函数处理实时流数据的灵活方式。它们可用于简单的任务，如打印传入的数据，或复杂的任务，如基于在线交易算法生成交易信号。

下单

FXCM API 允许下达和管理所有类型的订单，这些订单也可以通过 FXCM 的交易应用程序使用（例如入场订单或追踪止损订单）。³ 然而，以下代码仅说明基本的市场买卖订单，因为它们通常足以开始算法交易。

以下代码首先验证是否有空仓位，然后通过.create_market_buy_order()方法开设不同的仓位：

In [43]: api.get_open_positions() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[43]: Empty DataFrame Columns: [] Index: []In [44]: order = api.create_market_buy_order('EUR/USD', 100) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [45]: sel = ['tradeId', 'amountK', 'currency', 'grossPL', 'isBuy'] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [46]: api.get_open_positions()[sel] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[46]: tradeId amountK currency grossPL isBuy 0 169122817 100 EUR/USD -9.21945 TrueIn [47]: order = api.create_market_buy_order('EUR/GBP', 50) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [48]: api.get_open_positions()[sel]Out[48]: tradeId amountK currency grossPL isBuy 0 169122817 100 EUR/USD -8.38125 True 1 169122819 50 EUR/GBP -9.40900 True

显示连接的（默认）账户的开放仓位。

开设EUR/USD货币对的 100,000 的仓位。⁴

仅显示选定元素的开放仓位。

在 EUR/GBP 货币对中再次开仓 50,000。

.create_market_buy_order() 方法用于开仓或增加持仓，而 .create_market_sell_order() 允许关闭或减少持仓。还有更一般的方法允许平仓持仓，如下面的代码所示：

In [49]: order = api.create_market_sell_order('EUR/USD', 25) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [50]: order = api.create_market_buy_order('EUR/GBP', 50) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [51]: api.get_open_positions()[sel] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[51]: tradeId amountK currency grossPL isBuy 0 169122817 100 EUR/USD -7.54306 True 1 169122819 50 EUR/GBP -11.62340 True 2 169122834 25 EUR/USD -2.30463 False 3 169122835 50 EUR/GBP -9.96292 TrueIn [52]: api.close_all_for_symbol('EUR/GBP') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [53]: api.get_open_positions()[sel]Out[53]: tradeId amountK currency grossPL isBuy 0 169122817 100 EUR/USD -5.02858 True 1 169122834 25 EUR/USD -3.14257 FalseIn [54]: api.close_all() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [55]: api.get_open_positions()Out[55]: Empty DataFrame Columns: [] Index: []

减少 EUR/USD 货币对的持仓。

增加 EUR/GBP 货币对的持仓。

对于 EUR/GBP，现在有两个开仓多头持仓；与 EUR/USD 持仓相反，它不进行净额处理。

.close_all_for_symbol() 方法关闭指定符号的所有持仓。

.close_all() 方法一次性关闭所有持仓。

默认情况下，FXCM 将演示账户设置为对冲账户。这意味着用 10,000 进行 EUR/USD 的多头操作，并使用相同工具进行空头操作会导致两个不同的持仓。而 Oanda 的默认设置是净额账户，即对同一工具的订单和持仓进行净额处理。

账户信息

此外，例如，开仓持仓，FXCM API 还允许检索更一般的账户信息。例如，可以查找默认账户（如果有多个账户）或概览权益和保证金情况：

In [56]: api.get_default_account() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[56]: 1233279In [57]: api.get_accounts().T ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[57]: 0 t 6 ratePrecision 0 accountId 1233279 balance 47555.2 usdMr 0 mc N mcDate accountName 01233279 usdMr3 0 hedging Y usableMargin3 47555.2 usableMarginPerc 100 usableMargin3Perc 100 equity 47555.2 usableMargin 47555.2 bus 1000 dayPL 653.16 grossPL 0

显示默认 accountId 值。

显示所有账户的财务状况和一些参数。

本章介绍了 FXCM 的 RESTful API 用于算法交易，涵盖了以下主题：

为 API 使用设置一切
检索历史时刻数据
检索历史蜡烛数据
检索实时流数据
下单市场买入和卖出
查找账户信息

除了这些方面，FXCM API 和 fxcmpy 封装包当然提供了更多功能。然而，本章的主题是开始进行算法交易所需的基本构建模块。

使用 Oanda 和 FXCM，算法交易者有两个可用的交易平台（经纪商），提供广泛的金融工具和适当的 API 来实施自动化、算法交易策略。在第十章中添加了一些重要的方面。

以下资源涵盖了 FXCM 交易 API 和 Python 封装包：

交易 API：https://fxcm.github.io/rest-api-docs
fxcmpy 包：http://fxcmpy.tpq.io

¹ 注意，FXCM 的演示账户仅限于特定国家。

² DatetimeIndex 转换非常耗时，这就是为什么有两种与 Tick 数据检索相关的不同方法。

³ 参见 http://fxcmpy.tpq.io 下的文档。

⁴ 货币对的数量单位为千分之一，此外，请注意不同账户可能具有不同的杠杆比率。这意味着相同的头寸可能需要更多或更少的资产（保证金），具体取决于相关的杠杆比率。如有必要，请将示例数量调整为较低的值。参见 https://oreil.ly/xUHMP。

人们担心计算机会变得太聪明并接管世界，但实际问题是它们太愚蠢，而它们已经接管了世界。
Pedro Domingos

“现在怎么办？”你可能会想。能够检索历史数据和流数据的交易平台已经可用。它允许下单买卖，并检查账户状态。本书介绍了多种方法来通过预测市场价格走向来制定算法交易策略。你可能会问，“毕竟，如何将这些组合在一起以自动化方式工作？”这不能以一般性回答。然而，本章讨论了在这一背景下重要的几个主题。本章假设要部署单一的自动化算法交易策略。这简化了资本和风险管理等方面的处理。

本章涵盖以下主题。“资本管理”讨论Kelly 准则。根据策略特征和可用的交易资本，Kelly 准则有助于确定交易规模。要对算法交易策略获得信心，需要对其进行全面的回测，考虑其性能和风险特征。“基于 ML 的交易策略”对一个基于机器学习（ML）分类算法的示例策略进行回测，正如在“交易策略”中介绍的那样。要将算法交易策略部署到自动化交易中，需要将其转化为一个能够实时处理流入数据的在线算法。“在线算法”讨论将离线算法转化为在线算法的过程。

“基础设施和部署”致力于确保自动化算法交易策略在云中稳健可靠地运行。并不能详细涵盖所有相关主题，但从可用性、性能和安全性的角度看，云部署似乎是唯一可行的选择。“日志记录和监控”涵盖了日志记录和监控。日志记录对于分析部署自动化交易策略过程中的历史和某些事件至关重要。通过在第七章介绍的套接字通信进行监控，允许远程实时观察事件。本章以“视觉逐步概述”结束，提供了云中自动部署算法交易策略核心步骤的视觉概述。

算法交易中的一个核心问题是在总可用资金的基础上，要为给定的算法交易策略投入多少资金。这个问题的答案取决于一个人通过算法交易试图实现的主要目标。大多数个人和金融机构都会同意最大化长期财富是一个不错的候选目标。这就是爱德华·索普在推导出用于投资的Kelly 准则时所考虑的，正如 Rotando 和 Thorp（1992 年）所述。简单来说，Kelly 准则允许对一个交易员应该将多少资金投入到一种策略中进行明确的计算，考虑到了其统计回报特征。

二项式设定中的 Kelly 准则

引入 Kelly 准则到投资的常见方法是基于一个抛硬币游戏或者更一般地，一个二项式设置（只有两种可能结果）。本节遵循这条路径。假设一个赌徒正在与一个资金无限的银行或赌场玩抛硬币游戏。进一步假设正面的概率是一些值 $p$ ，对于这个值有以下成立：

$one-half less-than p less-than 1$

尾部概率由以下公式定义：

$q = 1 - p < \frac{1}{2}$

赌徒可以下任意大小的赌注 $b 大于 0$ ，如果正确则赢得相同金额，如果错误则全部输掉。鉴于对概率的假设，赌徒当然想要押注于正面。

因此，这个赌博游戏在一次性设置中的预期价值 $B$ （即代表这个游戏的随机变量）如下：

$𝐄 (B) = p \cdot b - q \cdot b = (p - q) \cdot b > 0$

一个风险中性的赌徒希望押注尽可能多的资金，因为这样可以最大化预期收益。然而，在金融市场上交易通常不是一次性的游戏。它是一个重复的游戏。因此，假设 $b_{i}$ 表示在第 $i$ 天押注的金额， $c 0$ 表示初始资本。第一天结束时的资本 $c 1$ 取决于当天的押注成功情况，可能是 $c 0 plus b 1$ 或者 $c 0 minus b 1$ 。那么，一个赌局重复 $n$ 次的预期值如下：

$𝐄 (B^{n}) = c_{0} + \sum_{i = 1}^{n} (p - q) \cdot b_{i}$

在经典经济理论中，对于风险中性、期望效用最大化的代理人来说，赌徒会试图最大化上述表达式。很容易看出，通过下注所有可用资金 $b_{i} = c_{i - 1}$ ，就像在一次性场景中一样，可以实现最大化。然而，这反过来意味着单次损失将清空所有可用资金，并导致破产（除非可以无限借贷）。因此，这种策略并不能导致长期财富的最大化。

尽管下最大赌注会导致突然破产，不下赌注则避免了任何形式的损失，但也无法从有利的赌博中受益。这就是凯利准则的作用所在，因为它确定了每轮下注时可用资本的最优比例 $f^{*}$ 。假设 $n = h + t$ ，其中 $h$ 表示在 $n$ 轮下注期间观察到的正面数量，而 $t$ 表示反面数量。基于这些定义，经过 $n$ 轮下注后的可用资本如下所示：

$c_{n} = c_{0} \cdot {(1 + f)}^{h} \cdot {(1 - f)}^{t}$

在这样的背景下，长期财富最大化归结为最大化每次下注的平均几何增长率，如下所示：

$\begin{matrix} r^{g} & = log {(\frac{c_{n}}{c_{0}})}^{1 / n} \\ = log {(\frac{c_{0} \cdot {(1 + f)}^{h} \cdot {(1 - f)}^{t}}{c_{0}})}^{1 / n} \\ = log {({(1 + f)}^{h} \cdot {(1 - f)}^{t})}^{1 / n} \\ = \frac{h}{n} log (1 + f) + \frac{t}{n} log (1 - f) \end{matrix}$

因此，问题的正式表述是通过选择最优的 $f$ 来最大化预期的平均增长率。使用 $bold upper E left-parenthesis h right-parenthesis equals n dot p$ 和 $bold upper E left-parenthesis t right-parenthesis equals n dot q$ ，可以得到：

$\begin{matrix} 𝐄 (r^{g}) & = 𝐄 (\frac{h}{n} log (1 + f) + \frac{t}{n} log (1 - f)) \\ = 𝐄 (p log (1 + f) + q log (1 - f)) \\ = p log (1 + f) + q log (1 - f) \\ \equiv G (f) \end{matrix}$

现在可以通过根据一阶条件选择最优比例 $f^{*}$ 来最大化该项。第一导数如下所示：

$\begin{matrix} G^{'} (f) & = \frac{p}{1 + f} - \frac{q}{1 - f} \\ = \frac{p - p f - q - q f}{(1 + f) (1 - f)} \\ = \frac{p - q - f}{(1 + f) (1 - f)} \end{matrix}$

根据一阶条件，得到如下结果：

$G^{'} (f) \overset{!}{=} 0 \Rightarrow f^{*} = p - q$

如果相信这是最大值（而不是最小值），这个结果意味着在每轮投注中投资一部分 $f^{} = p - q$ 是最优的。例如，如果 $p =$ 0.55，则 $f^{} =$ 0.55 - 0.45 = 0.1，或者最优部分是 10%。

下面的 Python 代码通过模拟形式化了这些概念和结果。首先，一些导入和配置：

In [1]: import math import time import numpy as np import pandas as pd import datetime as dt from pylab import plt, mplIn [2]: np.random.seed(1000) plt.style.use('seaborn') mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'serif'

这个想法是模拟，例如，每个序列进行 50 次投掷。这个 Python 代码很简单：

In [3]: p = 0.55 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [4]: f = p - (1 - p) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [5]: f ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[5]: 0.10000000000000009In [6]: I = 50 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [7]: n = 100 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)

设定正面的概率。

根据凯利准则计算最优部分。

要模拟的序列数。

每个序列的试验次数。

主要部分是 Python 函数run_simulation()，它根据前述假设进行模拟。图 10-1 展示了模拟结果：

In [8]: def run_simulation(f): c = np.zeros((n, I)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) c[0] = 100 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) for i in range(I): ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) for t in range(1, n): ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) o = np.random.binomial(1, p) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) if o > 0: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) c[t, i] = (1 + f) * c[t - 1, i] ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) else: ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) c[t, i] = (1 - f) * c[t - 1, i] ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) return cIn [9]: c_1 = run_simulation(f) ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)In [10]: c_1.round(2)Out[10]: array([[100. , 100. , 100. , ..., 100. , 100. , 100. ], [ 90. , 110. , 90. , ..., 110. , 90. , 110. ], [ 99. , 121. , 99. , ..., 121. , 81. , 121. ], ..., [226.35, 338.13, 413.27, ..., 123.97, 123.97, 123.97], [248.99, 371.94, 454.6 , ..., 136.37, 136.37, 136.37], [273.89, 409.14, 409.14, ..., 122.73, 150.01, 122.73]])In [11]: plt.figure(figsize=(10, 6)) plt.plot(c_1, 'b', lw=0.5) ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png) plt.plot(c_1.mean(axis=1), 'r', lw=2.5); ![12](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/12.png)

实例化一个ndarray对象来存储模拟结果。

将初始资本设定为 100。

外部循环处理序列模拟。

内部循环处理序列本身。

模拟抛硬币。

如果是1或正面…

…则将赢利加入资本。

如果是0或反面…

…从资本中扣除损失。

这将运行模拟。

绘制所有 50 个序列。

绘制所有 50 个序列的平均值。

图 10-1. 每个有 100 次试验的 50 个模拟序列（红线=平均值）

下面的代码重复了不同 $f$ 值的模拟。如图 10-2 所示，较低的分数通常导致平均增长率较低。较高的值可能导致模拟结束时的平均资本较高（ $f =$ 0.25），或者导致平均资本大幅降低（ $f =$ 0.5）。在分数较高的两种情况下，波动性显著增加：

In [12]: c_2 = run_simulation(0.05) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [13]: c_3 = run_simulation(0.25) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [14]: c_4 = run_simulation(0.5) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [15]: plt.figure(figsize=(10, 6)) plt.plot(c_1.mean(axis=1), 'r', label='$f^*=0.1$') plt.plot(c_2.mean(axis=1), 'b', label='$f=0.05$') plt.plot(c_3.mean(axis=1), 'y', label='$f=0.25$') plt.plot(c_4.mean(axis=1), 'm', label='$f=0.5$') plt.legend(loc=0);

使用 $f =$ 0.05 进行模拟。

使用 $f =$ 0.25 进行模拟。

使用 $f =$ 0.5 进行模拟。

图 10-2。不同 $f$ 值的时间平均资本。

股票和指数的凯利准则

现在假设股票市场设置中，相关股票（指数）在今天之后的一年内只能采取两个值，根据其今天已知的值。再次，该设置是二项式的，但这次在建模方面更接近股市实际情况。¹ 具体假设如下：

$P (r^{S} = μ + σ) = P (r^{S} = μ - σ) = \frac{1}{2} 在这里， 𝐄 (r^{S}) = μ > 0 是股票一年期预期收益， σ > 0 是收益的标准偏差（波动性）。在一个期间的设定中，一年后的可用资本如下（其中 c_{0} 和 f 如前所定义）： c (f) = c_{0} \cdot (1 + (1 - f) \cdot r + f \cdot r^{S}) 在这里， r 是未投资于股票的现金所赚的固定短期利率。最大化几何增长率意味着最大化该术语： G (f) = 𝐄 (log \frac{c (f)}{c_{0}}) 现在假设一年中有 n 个相关的交易日，以便对每个这样的交易日 i ，以下情况成立： P (r_{i}^{S} = \frac{μ}{n} + \frac{σ}{\sqrt{n}}) = P (r_{i}^{S} = \frac{μ}{n} - \frac{σ}{\sqrt{n}}) = \frac{1}{2} 注意波动性随着交易日数的平方根而增加。在这些假设下，日常价值从之前的年度价值扩展到以下内容： c_{n} (f) = c_{0} \cdot \prod_{i = 1}^{n} (1 + (1 - f) \cdot \frac{r}{n} + f \cdot r_{i}^{S}) 投资股票时，为了实现长期财富最大化，现在必须最大化以下数量： \begin{matrix} G_{n} (f) & = 𝐄 (log \frac{c_{n} (f)}{c_{0}}) \\ = 𝐄 (\sum_{i = 1}^{n} log (1 + (1 - f) \cdot \frac{r}{n} + f \cdot r_{i}^{S})) \\ = \frac{1}{2} \sum_{i = 1}^{n} log (1 + (1 - f) \cdot \frac{r}{n} + f \cdot (\frac{μ}{n} + \frac{σ}{\sqrt{n}})) \\ + log (1 + (1 - f) \cdot \frac{r}{n} + f \cdot (\frac{μ}{n} - \frac{σ}{\sqrt{n}})) \\ = \frac{n}{2} log ({(1 + (1 - f) \cdot \frac{r}{n} + f \cdot \frac{μ}{n})}^{2} - \frac{f^{2} σ^{2}}{n}) \end{matrix} 使用 Taylor 级数展开，最终得出以下结果： G_{n} (f) = r + (μ - r) \cdot f - \frac{σ^{2}}{2} \cdot f^{2} + 𝒪 (\frac{1}{\sqrt{n}}) 或者对于无限多的交易时间点（即连续交易），最终得出以下结果： G_{\infty} (f) = r + (μ - r) \cdot f - \frac{σ^{2}}{2} \cdot f^{2} 然后，通过以下表达式通过一阶条件确定最优分数 f^{*} ： f^{*} = \frac{μ - r}{σ^{2}} 这代表了股票预期超过无风险利率的回报率与回报率方差的比值。这个表达式看起来类似于夏普比率，但有所不同。一个实际案例将说明前述公式的应用及其在交易策略中杠杆化的作用。考虑的交易策略简单地是标准普尔 500 指数的被动长期持仓。为此，基础数据迅速获取，并且所需的统计数据很容易衍生： In [16]: raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True)In [17]: symbol = '.SPX'In [18]: data = pd.DataFrame(raw[symbol])In [19]: data['return'] = np.log(data / data.shift(1))In [20]: data.dropna(inplace=True)In [21]: data.tail()Out[21]: .SPX return Date 2019-12-23 3224.01 0.000866 2019-12-24 3223.38 -0.000195 2019-12-27 3240.02 0.000034 2019-12-30 3221.29 -0.005798 2019-12-31 3230.78 0.002942 在涵盖期间内，标准普尔 500 指数的统计特性建议将约 4.5 的最优分数投资于指数的长期持仓中。换句话说，对于每可用美元，将投资 4.5 美元，这意味着根据最优凯利分数或在这种情况下最优的凯利因子，杠杆率为 4.5。其他条件相等时，凯利准则意味着在预期回报较高且波动性（方差）较低时，杠杆率更高： In [22]: mu = data['return'].mean() * 252 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [23]: mu ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[23]: 0.09992181916534204In [24]: sigma = data['return'].std() * 252 ** 0.5 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [25]: sigma ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[25]: 0.14761569775486563In [26]: r = 0.0 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [27]: f = (mu - r) / sigma ** 2 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [28]: f ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[28]: 4.585590244019818 计算年化回报。计算年化波动率。将无风险利率设置为 0（简化起见）。计算要投资于策略的最优凯利分数。以下 Python 代码模拟了凯利准则及最优杠杆比率的应用。为了简化和比较的原因，初始资本设置为 1，而初始投资总资本设置为 1 \cdot f^{*} 。根据策略部署的资本表现，每日调整可用权益来调整总资本。亏损后，资本减少；盈利后，资本增加。相对于指数本身，权益位置的演变显示在图10-3 中： In [29]: equs = []In [30]: def kelly_strategy(f): global equs equ = 'equity_{:.2f}'.format(f) equs.append(equ) cap = 'capital_{:.2f}'.format(f) data[equ] = 1 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) data[cap] = data[equ] * f ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) for i, t in enumerate(data.index[1:]): t_1 = data.index[i] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) data.loc[t, cap] = data[cap].loc[t_1] * \ math.exp(data['return'].loc[t]) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) data.loc[t, equ] = data[cap].loc[t] - \ data[cap].loc[t_1] + \ data[equ].loc[t_1] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) data.loc[t, cap] = data[equ].loc[t] * f ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [31]: kelly_strategy(f * 0.5) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [32]: kelly_strategy(f * 0.66) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [33]: kelly_strategy(f) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [34]: print(data[equs].tail()) equity_2.29 equity_3.03 equity_4.59 Date 2019-12-23 6.628865 9.585294 14.205748 2019-12-24 6.625895 9.579626 14.193019 2019-12-27 6.626410 9.580610 14.195229 2019-12-30 6.538582 9.412991 13.818934 2019-12-31 6.582748 9.496919 14.005618In [35]: ax = data['return'].c*msum().apply(np.exp).plot(figsize=(10, 6)) data[equs].plot(ax=ax, legend=True); 生成一个名为 equity 的新列，并将初始值设为 1。生成一个名为 capital 的新列，并将初始值设为 1 \cdot f^{*} 。选择前值的正确 DatetimeIndex 值。根据回报计算新的资本位置。根据资本位置的表现调整权益价值。根据新的权益位置和固定杠杆比例调整资本位置。模拟基于 Kelly 准则的策略，针对一半的 f … …三分之二的 f … …和 f 本身。图 10-3. S&P 500 的总表现与给定不同 f 值的权益位置的比较如图 10-3 所示，应用最优 Kelly 杠杆导致权益位置的演变相当不稳定（高波动性），这在直觉上是合理的，考虑到杠杆比率为 4.59. 预计随着杠杆增加，权益位置的波动性会增加。因此，从业者通常不使用“全凯利”（4.6），而是使用“半凯利”（2.3）。在当前示例中，这被减少到： \frac{1}{2} • f^{*} \approx 2.3 在此背景下，图 10-3 还展示了对“全凯利”低于的值的权益位置的演变。“全凯利”的风险确实随着 latexmath:[$f$] 的降低而减少。第八章介绍了 Oanda 交易平台，其 RESTful API 和 Python 封装包 tpqoa 。本节将基于 Oanda v20 RESTful API 的历史数据，结合基于 ML 的方法来预测市场价格走势，以测试 EUR/USD 货币对的算法交易策略。它使用向量化回测，并考虑了买卖价差作为比例交易成本。与第四章介绍的普通向量化回测方法相比，它还对所测试的交易策略的风险特征进行了更深入的分析。向量化回测回测基于日内数据，更具体地说是 10 分钟长的条形图。以下代码连接到 Oanda v20 API 并检索一周的 10 分钟条形图数据。图 10-4 可视化了所检索数据的时间段内的中间收盘价格： In [36]: import tpqoaIn [37]: %time api = tpqoa.tpqoa('../pyalgo.cfg') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 893 µs, sys: 198 µs, total: 1.09 ms Wall time: 1.04 msIn [38]: instrument = 'EUR_USD' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [39]: raw = api.get_history(instrument, start='2020-06-08', end='2020-06-13', granularity='M10', price='M') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [40]: raw.tail()Out[40]: o h l c volume complete time 2020-06-12 20:10:00 1.12572 1.12593 1.12532 1.12568 221 True 2020-06-12 20:20:00 1.12569 1.12578 1.12532 1.12558 163 True 2020-06-12 20:30:00 1.12560 1.12573 1.12534 1.12543 192 True 2020-06-12 20:40:00 1.12544 1.12594 1.12528 1.12542 219 True 2020-06-12 20:50:00 1.12544 1.12624 1.12541 1.12554 296 TrueIn [41]: raw.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 701 entries, 2020-06-08 00:00:00 to 2020-06-12 20:50:00 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 o 701 non-null float64 1 h 701 non-null float64 2 l 701 non-null float64 3 c 701 non-null float64 4 volume 701 non-null int64 5 complete 701 non-null bool dtypes: bool(1), float64(4), int64(1) memory usage: 33.5 KBIn [42]: spread = 0.00012 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [43]: mean = raw['c'].mean() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [44]: ptc = spread / mean ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) ptc ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[44]: 0.00010599557439495706In [45]: raw['c'].plot(figsize=(10, 6), legend=True); 连接到 API 并检索数据。指定平均买卖价差。计算数据集的平均收盘价。计算给定平均差价和平均中间收盘价的平均比例交易成本。图 10-4. 欧元/美元汇率（10 分钟柱状图）基于 ML 的策略使用了多个时间序列特征，如对数收益率以及收盘价的最小值和最大值。此外，特征数据是滞后的。换句话说，ML 算法将从历史模式中学习，这些模式由滞后特征数据体现： In [46]: data = pd.DataFrame(raw['c'])In [47]: data.columns = [instrument,]In [48]: window = 20 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) data['return'] = np.log(data / data.shift(1)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) data['vol'] = data['return'].rolling(window).std() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) data['mom'] = np.sign(data['return'].rolling(window).mean()) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) data['sma'] = data[instrument].rolling(window).mean() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) data['min'] = data[instrument].rolling(window).min() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) data['max'] = data[instrument].rolling(window).max() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [49]: data.dropna(inplace=True)In [50]: lags = 6 ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [51]: features = ['return', 'vol', 'mom', 'sma', 'min', 'max'] ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [52]: cols = [] for f in features: for lag in range(1, lags + 1): col = f'{f}_lag_{lag}' data[col] = data[f].shift(lag) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) cols.append(col)In [53]: data.dropna(inplace=True)In [54]: data['direction'] = np.where(data['return'] > 0, 1, -1) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)In [55]: data[cols].iloc[:lags, :lags] ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)Out[55]: return_lag_1 return_lag_2 return_lag_3 return_lag_4 \ time 2020-06-08 04:20:00 0.000097 0.000018 -0.000452 0.000035 2020-06-08 04:30:00 -0.000115 0.000097 0.000018 -0.000452 2020-06-08 04:40:00 0.000027 -0.000115 0.000097 0.000018 2020-06-08 04:50:00 -0.000142 0.000027 -0.000115 0.000097 2020-06-08 05:00:00 0.000035 -0.000142 0.000027 -0.000115 2020-06-08 05:10:00 -0.000159 0.000035 -0.000142 0.000027 return_lag_5 return_lag_6 time 2020-06-08 04:20:00 0.000000 0.000009 2020-06-08 04:30:00 0.000035 0.000000 2020-06-08 04:40:00 -0.000452 0.000035 2020-06-08 04:50:00 0.000018 -0.000452 2020-06-08 05:00:00 0.000097 0.000018 2020-06-08 05:10:00 -0.000115 0.000097 指定某些特征的窗口长度。计算收盘价的对数收益率。计算滚动波动率。将时间序列动量定义为最近对数收益率的平均值。计算简单移动平均线。计算滚动最大值。计算滚动最小值。将滞后特征数据添加到 DataFrame 对象中。将标签数据定义为市场方向（ +1 或上涨和 -1 或下跌）。显示结果滞后特征数据的一个小子集。针对特征和标签数据，现在可以应用不同的监督学习算法。接下来，使用 scikit-learn ML 包中的所谓 AdaBoost 算法进行分类（参见 AdaBoostClassifier ）。在分类的背景下，增强的概念是使用基分类器的集成，以得到一个更优的预测器，理论上不太容易过拟合（参见“数据窥探和过拟合”）。作为基分类器，使用了 scikit-learn 中的决策树分类算法（参见 DecisionTreeClassifier ）。该代码根据顺序训练-测试分割来训练和测试算法交易策略。模型在训练和测试数据上的准确率分数均显著高于 50%。除了准确率分数外，也可以在金融交易的背景下讨论交易策略的命中率（即获胜交易数与所有交易数的比例）。由于命中率显著高于 50%，这可能表明——根据凯利准则的背景——相对于随机行走设置，存在统计优势： In [56]: from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import AdaBoostClassifierIn [57]: n_estimators=15 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) random_state=100 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) max_depth=2 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) min_samples_leaf=15 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) subsample=0.33 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [58]: dtc = DecisionTreeClassifier(random_state=random_state, max_depth=max_depth, min_samples_leaf=min_samples_leaf) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [59]: model = AdaBoostClassifier(base_estimator=dtc, n_estimators=n_estimators, random_state=random_state) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [60]: split = int(len(data) * 0.7)In [61]: train = data.iloc[:split].copy()In [62]: mu, std = train.mean(), train.std() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [63]: train_ = (train - mu) / std ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [64]: model.fit(train_[cols], train['direction']) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[64]: AdaBoostClassifier(algorithm='SAMME.R', base_estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=2, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=15, min_samples_split=2, min_weight_fraction_leaf=0.0, presort='deprecated', random_state=100, splitter='best'), learning_rate=1.0, n_estimators=15, random_state=100)In [65]: accuracy_score(train['direction'], model.predict(train_[cols])) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[65]: 0.8050847457627118In [66]: test = data.iloc[split:].copy() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [67]: test_ = (test - mu) / std ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)In [68]: test['position'] = model.predict(test_[cols]) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [69]: accuracy_score(test['direction'], test['position']) ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)Out[69]: 0.5665024630541872 指定机器学习算法的主要参数（请参阅之前提供的模型类的参考资料）。实例化基础分类算法（决策树）。实例化 AdaBoost 分类算法。对训练特征数据集应用高斯归一化。根据训练数据集拟合模型。展示来自训练模型的样本内预测准确率（训练数据集）。对测试特征数据集应用高斯归一化（使用训练特征数据集的参数）。生成测试数据集的预测。展示来自训练模型的样本外预测准确率。众所周知，命中率只是金融交易成功的一方面。另一方面包括正确把握重要交易和交易策略隐含的交易成本等因素。² 为此，只有正式的矢量化回测方法才能评估交易策略的质量。以下代码考虑了基于平均买卖价差的比例交易成本。图10-5 比较了算法交易策略（无交易成本和有比例交易成本）与被动基准投资的表现： In [70]: test['strategy'] = test['position'] * test['return'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [71]: sum(test['position'].diff() != 0) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[71]: 77In [72]: test['strategy_tc'] = np.where(test['position'].diff() != 0, test['strategy'] - ptc, ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) test['strategy'])In [73]: test[['return', 'strategy', 'strategy_tc']].sum( ).apply(np.exp)Out[73]: return 0.990182 strategy 1.015827 strategy_tc 1.007570 dtype: float64In [74]: test[['return', 'strategy', 'strategy_tc']].c*msum( ).apply(np.exp).plot(figsize=(10, 6)); 推导基于机器学习的算法交易策略的对数收益率。根据持仓变动计算交易策略涉及的交易次数。每次交易发生时，从该日策略的对数收益中扣除比例交易成本。图 10-5. EUR/USD 汇率和算法交易策略的总体表现（扣除交易成本前后）矢量化回测在测试策略接近市场实际情况方面存在局限性。例如，它不允许直接包括每笔交易的固定交易成本。可以作为一种近似，通过取平均位置大小的平均比例交易成本的倍数来间接考虑固定交易成本。然而，这一方法通常不够精确。如果需要更高精度，需要采用其他方法，比如基于事件的回测（见第六章），通过对价格数据的每个条形图进行显式循环。最佳杠杆配备交易策略的对数收益数据，可以计算均值和方差值，以便根据凯利准则确定最佳杠杆。接下来的代码将这些数字缩放为年化值，尽管这不会改变根据凯利准则的最佳杠杆值，因为均值回报和方差随着同一因子缩放： In [75]: mean = test[['return', 'strategy_tc']].mean() * len(data) * 52 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) meanOut[75]: return -1.705965 strategy_tc 1.304023 dtype: float64In [76]: var = test[['return', 'strategy_tc']].var() * len(data) * 52 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) varOut[76]: return 0.011306 strategy_tc 0.011370 dtype: float64In [77]: vol = var ** 0.5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) volOut[77]: return 0.106332 strategy_tc 0.106631 dtype: float64In [78]: mean / var ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[78]: return -150.884961 strategy_tc 114.687875 dtype: float64In [79]: mean / var * 0.5 ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[79]: return -75.442481 strategy_tc 57.343938 dtype: float64 年化平均回报。年化方差。年化波动率。根据凯利准则确定的最佳杠杆（“全凯利”）。根据凯利准则（“半凯利”）确定的最佳杠杆。使用“半凯利”准则，交易策略的最佳杠杆在 50 以上。对于一些经纪商，如 Oanda，以及某些金融工具，如外汇对和差价合约（CFDs），即使对于零售交易者，这样的杠杆比率也是可行的。图 10-6 显示了带有交易成本的不同杠杆值下交易策略的表现比较： In [80]: to_plot = ['return', 'strategy_tc']In [81]: for lev in [10, 20, 30, 40, 50]: label = 'lstrategy_tc_%d' % lev test[label] = test['strategy_tc'] * lev ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) to_plot.append(label)In [82]: test[to_plot].c*msum().apply(np.exp).plot(figsize=(10, 6)); 缩放不同杠杆值的策略回报。图 10-6. 算法交易策略在不同杠杆值下的总体表现杠杆增加了与交易策略相关的风险。交易者应仔细阅读风险声明和监管规定。积极的回测表现也不能保证未来的表现。所有展示的结果仅用于说明编程和分析方法的应用。在某些司法管辖区，如德国，零售交易者的杠杆比例根据不同的金融工具组别进行限制。风险分析由于杠杆显著增加了与某一交易策略相关的风险，因此需要进行更深入的风险分析。随后的风险分析假设杠杆比率为 30。首先，将计算最大回撤和最长回撤期。最大回撤是最近高点后的最大损失（低点）。因此，最长回撤期是交易策略需要恢复到最近高点的最长时间。分析假设初始股本位置为 3,333 欧元，导致杠杆比率为 30 的初始头寸大小为 100,000 欧元。还假设无论绩效如何，股本随时间不会调整： In [83]: equity = 3333 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [84]: risk = pd.DataFrame(test['lstrategy_tc_30']) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [85]: risk['equity'] = risk['lstrategy_tc_30'].c*msum( ).apply(np.exp) * equity ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [86]: risk['cummax'] = risk['equity'].cummax() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [87]: risk['drawdown'] = risk['cummax'] - risk['equity'] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [88]: risk['drawdown'].max() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[88]: 511.38321383258017In [89]: t_max = risk['drawdown'].idxmax() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) t_max ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[89]: Timestamp('2020-06-12 10:30:00') 初始股权。相关的对数收益时间序列… …按初始股权缩放。随时间的累计最大值。随时间变化的回撤数值。最大回撤值。发生时的时间点。技术上，新高点的特征是回撤值为 0。回撤期是两个这样的高点之间的时间。图 10-7 可视化了最大回撤和回撤期： In [90]: temp = risk['drawdown'][risk['drawdown'] == 0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [91]: periods = (temp.index[1:].to_pydatetime() - temp.index[:-1].to_pydatetime()) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [92]: periods[20:30] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[92]: array([datetime.timedelta(seconds=600), datetime.timedelta(seconds=1200), datetime.timedelta(seconds=1200), datetime.timedelta(seconds=1200)], dtype=object)In [93]: t_per = periods.max() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [94]: t_per ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[94]: datetime.timedelta(seconds=26400)In [95]: t_per.seconds / 60 / 60 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[95]: 7.333333333333333In [96]: risk[['equity', 'cummax']].plot(figsize=(10, 6)) plt.axvline(t_max, c='r', alpha=0.5); 确定需要将回撤降为 0 的高点。计算所有高点之间的 timedelta 值。秒为单位的最长回撤期… …转换为小时。另一个重要的风险度量是风险价值（VaR）。它以货币金额表示，代表在特定时间段和置信水平下可能发生的最大损失。图 10-7。最大回撤（垂直线）和回撤期（水平线）下面的代码根据杠杆交易策略的股本位置的对数收益随时间的不同置信水平推导 VaR 值。时间间隔固定为十分钟的条长度： In [97]: import scipy.stats as scsIn [98]: percs = [0.01, 0.1, 1., 2.5, 5.0, 10.0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [99]: risk['return'] = np.log(risk['equity'] / risk['equity'].shift(1))In [100]: VaR = scs.scoreatpercentile(equity * risk['return'], percs) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [101]: def print_var(): print('{} {}'.format('Confidence Level', 'Value-at-Risk')) print(33 * '-') for pair in zip(percs, VaR): print('{:16.2f} {:16.3f}'.format(100 - pair[0], -pair[1])) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [102]: print_var() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) Confidence Level Value-at-Risk --------------------------------- 99.99 162.570 99.90 161.348 99.00 132.382 97.50 122.913 95.00 100.950 90.00 62.622 确定要使用的百分位值。根据百分位值计算 VaR 值。将百分位值转换为置信水平，将 VaR 值（负值）转换为正值以供打印。最后，以下代码通过对原始 DataFrame 对象重新采样来计算一小时时间范围内的 VaR 值。实际上，所有置信水平的 VaR 值都会增加： In [103]: hourly = risk.resample('1H', label='right').last() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [104]: hourly['return'] = np.log(hourly['equity'] / hourly['equity'].shift(1))In [105]: VaR = scs.scoreatpercentile(equity * hourly['return'], percs) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [106]: print_var() Confidence Level Value-at-Risk --------------------------------- 99.99 252.460 99.90 251.744 99.00 244.593 97.50 232.674 95.00 125.498 90.00 61.701 将数据从 10 分钟重新采样为 1 小时的条形图。根据百分位值计算 VaR 值。持久化模型对象一旦基于回测、杠杆和风险分析结果接受了算法交易策略，模型对象和其他相关的算法组件可能会被持久化以备后续部署使用。它现在体现了基于机器学习的交易策略或交易算法。 In [107]: import pickleIn [108]: algorithm = {'model': model, 'mu': mu, 'std': std}In [109]: pickle.dump(algorithm, open('algorithm.pkl', 'wb')) 到目前为止测试的交易算法是离线算法。这类算法使用完整的数据集来解决手头的问题。问题在于训练一个基于决策树作为基础分类器的 AdaBoost 分类算法，多个不同的时间序列特征和方向性标签数据。实际上，在金融市场上部署交易算法时，它必须逐步消耗数据，以预测下一个时间间隔（条）市场运动方向。本节利用前一节的持久化模型对象，并将其嵌入到流数据环境中。将离线交易算法转化为在线交易算法的代码主要解决以下问题： Tick 数据 Tick 数据以实时方式到达并需要实时处理，例如被收集到一个 DataFrame 对象中。重新采样 Tick 数据将根据交易算法重新采样到适当的条形长度。为了说明，重新采样时使用的条形长度比训练和回测时使用的条形长度短。预测交易算法为未来的时间间隔内市场运动方向生成预测。订单根据当前位置和算法生成的预测（“信号”），下订单或保持位置不变。第八章，特别是“使用流数据”，展示了如何实时从 Oanda API 获取 Tick 数据。基本方法是重新定义 tpqoa.tpqoa 类的 .on_success() 方法以实现交易逻辑。首先加载持久化的交易算法；它代表了要遵循的交易逻辑。它由训练好的模型本身和用于特征数据归一化的参数组成，这些都是算法的整体部分： In [110]: algorithm = pickle.load(open('algorithm.pkl', 'rb'))In [111]: algorithm['model']Out[111]: AdaBoostClassifier(algorithm='SAMME.R', base_estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=2, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=15, min_samples_split=2, min_weight_fraction_leaf=0.0, presort='deprecated', random_state=100, splitter='best'), learning_rate=1.0, n_estimators=15, random_state=100) 在下面的代码中，新类 MLTrader 继承自 tpqoa.tpqoa ，通过 .on_success() 和额外的辅助方法，将交易算法转换为实时环境。这是将离线算法转换为所谓的在线算法的过程： In [112]: class MLTrader(tpqoa.tpqoa): def __init__(self, config_file, algorithm): super(MLTrader, self).__init__(config_file) self.model = algorithm['model'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.mu = algorithm['mu'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.std = algorithm['std'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) self.units = 100000 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) self.position = 0 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) self.bar = '5s' ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) self.window = 2 ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) self.lags = 6 ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) self.min_length = self.lags + self.window + 1 self.features = ['return', 'sma', 'min', 'max', 'vol', 'mom'] self.raw_data = pd.DataFrame() def prepare_features(self): ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) self.data['return'] = np.log(self.data['mid'] / self.data['mid'].shift(1)) self.data['sma'] = self.data['mid'].rolling(self.window).mean() self.data['min'] = self.data['mid'].rolling(self.window).min() self.data['mom'] = np.sign( self.data['return'].rolling(self.window).mean()) self.data['max'] = self.data['mid'].rolling(self.window).max() self.data['vol'] = self.data['return'].rolling( self.window).std() self.data.dropna(inplace=True) self.data[self.features] -= self.mu self.data[self.features] /= self.std self.cols = [] for f in self.features: for lag in range(1, self.lags + 1): col = f'{f}_lag_{lag}' self.data[col] = self.data[f].shift(lag) self.cols.append(col) def on_success(self, time, bid, ask): ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) df = pd.DataFrame({'bid': float(bid), 'ask': float(ask)}, index=[pd.Timestamp(time).tz_localize(None)]) self.raw_data = self.raw_data.append(df) self.data = self.raw_data.resample(self.bar, label='right').last().ffill() self.data = self.data.iloc[:-1] if len(self.data) > self.min_length: self.min_length +=1 self.data['mid'] = (self.data['bid'] + self.data['ask']) / 2 self.prepare_features() features = self.data[ self.cols].iloc[-1].values.reshape(1, -1) signal = self.model.predict(features)[0] print(f'NEW SIGNAL: {signal}', end='\r') if self.position in [0, -1] and signal == 1: ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) print('*** GOING LONG ***') self.create_order(self.stream_instrument, units=(1 - self.position) * self.units) self.position = 1 elif self.position in [0, 1] and signal == -1: ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png) print('*** GOING SHORT ***') self.create_order(self.stream_instrument, units=-(1 + self.position) * self.units) self.position = -1 已训练的 AdaBoost 模型对象和标准化参数。交易单位数。初始的中性位置。实施算法的条形图长度。选定功能的窗口长度。滞后数目（必须与算法训练一致）。生成滞后特征数据的方法。具体化体现交易逻辑的重新定义方法。检查长信号和长交易。检查短信号和短交易。使用新的类 MLTrader ，自动交易变得简单。在交互式环境中只需几行代码即可。参数设置使得第一个订单在短时间后被下达。然而，在实际中，所有参数当然必须与研究和回测阶段的原始参数保持一致。例如，它们也可以持久化在磁盘上，并且可以通过算法读取： In [113]: mlt = MLTrader('../pyalgo.cfg', algorithm) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [114]: mlt.stream_data(instrument, stop=500) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) print('*** CLOSING OUT ***') mlt.create_order(mlt.stream_instrument, units=-mlt.position * mlt.units) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) 实例化交易对象。开始流式处理、数据处理和交易。结束最后一个开放位置。上述代码生成类似以下输出： *** GOING LONG *** {'id': '1735', 'time': '2020-08-19T14:46:15.552233563Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1734', 'requestID': '42730658849646182', 'type': 'ORDER_FILL', 'orderID': '1734', 'instrument': 'EUR_USD', 'units': '100000.0', 'gainQuoteHomeConversionFactor': '0.835983419025', 'lossQuoteHomeConversionFactor': '0.844385262432', 'price': 1.1903, 'fullVWAP': 1.1903, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19013, 'liquidity': '10000000'}], 'asks': [{'price': 1.1903, 'liquidity': '10000000'}], 'closeoutBid': 1.19013, 'closeoutAsk': 1.1903}, 'reason': 'MARKET_ORDER', 'pl': '0.0', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98507.7425', 'tradeOpened': {'tradeID': '1735', 'units': '100000.0', 'price': 1.1903, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '7.1416', 'initialMarginRequired': '3330.0'}, 'halfSpreadCost': '7.1416'} *** GOING SHORT *** {'id': '1737', 'time': '2020-08-19T14:48:10.510726213Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1736', 'requestID': '42730659332312267', 'type': 'ORDER_FILL', 'orderID': '1736', 'instrument': 'EUR_USD', 'units': '-200000.0', 'gainQuoteHomeConversionFactor': '0.835885095595', 'lossQuoteHomeConversionFactor': '0.844285950827', 'price': 1.19029, 'fullVWAP': 1.19029, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19029, 'liquidity': '10000000'}], 'asks': [{'price': 1.19042, 'liquidity': '10000000'}], 'closeoutBid': 1.19029, 'closeoutAsk': 1.19042}, 'reason': 'MARKET_ORDER', 'pl': '-0.8443', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98506.8982', 'tradeOpened': {'tradeID': '1737', 'units': '-100000.0', 'price': 1.19029, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '5.4606', 'initialMarginRequired': '3330.0'}, 'tradesClosed': [{'tradeID': '1735', 'units': '-100000.0', 'price': 1.19029, 'realizedPL': '-0.8443', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '5.4606'}], 'halfSpreadCost': '10.9212'} *** GOING LONG *** {'id': '1739', 'time': '2020-08-19T14:48:15.529680632Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1738', 'requestID': '42730659353297789', 'type': 'ORDER_FILL', 'orderID': '1738', 'instrument': 'EUR_USD', 'units': '200000.0', 'gainQuoteHomeConversionFactor': '0.835835944263', 'lossQuoteHomeConversionFactor': '0.844236305512', 'price': 1.1905, 'fullVWAP': 1.1905, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19035, 'liquidity': '10000000'}], 'asks': [{'price': 1.1905, 'liquidity': '10000000'}], 'closeoutBid': 1.19035, 'closeoutAsk': 1.1905}, 'reason': 'MARKET_ORDER', 'pl': '-17.729', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98489.1692', 'tradeOpened': {'tradeID': '1739', 'units': '100000.0', 'price': 1.1905, 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '6.3003', 'initialMarginRequired': '3330.0'}, 'tradesClosed': [{'tradeID': '1737', 'units': '100000.0', 'price': 1.1905, 'realizedPL': '-17.729', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '6.3003'}], 'halfSpreadCost': '12.6006'} *** CLOSING OUT *** {'id': '1741', 'time': '2020-08-19T14:49:11.976885485Z', 'userID': 13834683, 'accountID': '101-004-13834683-001', 'batchID': '1740', 'requestID': '42730659588338204', 'type': 'ORDER_FILL', 'orderID': '1740', 'instrument': 'EUR_USD', 'units': '-100000.0', 'gainQuoteHomeConversionFactor': '0.835730636848', 'lossQuoteHomeConversionFactor': '0.844129939731', 'price': 1.19051, 'fullVWAP': 1.19051, 'fullPrice': {'type': 'PRICE', 'bids': [{'price': 1.19051, 'liquidity': '10000000'}], 'asks': [{'price': 1.19064, 'liquidity': '10000000'}], 'closeoutBid': 1.19051, 'closeoutAsk': 1.19064}, 'reason': 'MARKET_ORDER', 'pl': '0.8357', 'financing': '0.0', 'commission': '0.0', 'guaranteedExecutionFee': '0.0', 'accountBalance': '98490.0049', 'tradesClosed': [{'tradeID': '1739', 'units': '-100000.0', 'price': 1.19051, 'realizedPL': '0.8357', 'financing': '0.0', 'guaranteedExecutionFee': '0.0', 'halfSpreadCost': '5.4595'}], 'halfSpreadCost': '5.4595'} 使用真实资金部署自动化算法交易策略需要适当的基础设施。基础设施应满足以下要求之一：可靠性部署算法交易策略的基础设施应确保高可用性（例如，99.9%或更高），并应确保可靠性（自动备份、磁盘冗余和网络连接冗余等）。性能根据处理数据量和算法生成的计算需求，基础设施必须具备足够的 CPU 核心、工作内存（RAM）和存储（SSD）。此外，网络连接速度也应足够快。安全性操作系统及其上运行的应用程序应受到强密码、SSL 加密和硬盘加密的保护。硬件应受到防火、防水和未经授权的物理访问的保护。基本上，这些要求只能通过向专业数据中心或云提供商租用适当的基础设施来实现。通常情况下，只有金融市场中更大、甚至最大的参与者才能通过在物理基础设施上进行自有投资来满足上述要求。从开发和测试的角度来看，即使是 DigitalOcean 的最小 Droplet（云实例）也足以开始使用。在撰写本文时，这样一个 Droplet 每月的成本是 5 美元，并按小时计费，可以在几分钟内创建，并在几秒钟内销毁³。如何使用 DigitalOcean 设置 Droplet 在第二章中详细解释（具体在“使用云实例”），使用可以调整以反映个人对 Python 包的要求的 Bash 脚本。尽管可以从本地计算机（台式机、笔记本电脑或类似设备）进行自动化算法交易策略的开发和测试，但这并不适用于部署用于交易真实资金的自动化策略。简单的网络连接中断或短暂的停电可能会导致整个算法崩溃，例如，在投资组合中留下意外的未平仓头寸。另一个例子是，这将导致错过实时的 tick 数据，并最终得到损坏的数据集，可能导致错误的信号和意外的交易和头寸。假设现在自动化算法交易策略要部署在远程服务器上（虚拟云实例或专用服务器）。进一步假设所有必需的 Python 包已经安装好（参见“使用云实例”）并且，例如， Jupyter Lab 正在安全运行（参见运行笔记本服务器）。从算法交易员的角度考虑，如果他们不想整天坐在屏幕前登录到服务器上，还需要考虑什么？这一节涉及两个重要主题：日志记录和实时监控。日志记录将信息和事件持久化到磁盘以供以后检查。这是软件应用开发和部署的标准做法。然而，在这里可能更侧重于金融方面，记录重要的财务数据和事件信息以供以后检查和分析。实时监控也是如此，利用套接字通信。通过套接字，可以创建重要财务方面的持续实时流，然后可以在本地计算机上检索和处理，即使部署在云端也可以。 “自动交易策略”介绍了一个 Python 脚本，实现了所有这些方面，并利用了来自“在线算法”的代码。该脚本将代码整理成一种形式，例如部署算法交易策略——基于持久化的算法对象——到远程服务器上。它还基于自定义函数增加了日志记录和监控功能，其中包括使用 ZeroMQ （参见 http://zeromq.org ）进行套接字通信。结合来自“策略监控”的简短脚本，这允许对远程服务器上的活动进行远程实时监控。⁴ 当执行“自动交易策略”的脚本时，无论是本地还是远程执行，记录并通过套接字发送的输出如下所示： 2020-06-15 17:04:14.298653================================================================================NUMBER OF TICKS: 147 | NUMBER OF BARS: 49================================================================================MOST RECENT DATA return_lag_1 return_lag_2 ... max_lag_5 max_lag_62020-06-15 15:04:06 0.026508 -0.125253 ... -1.703276 -1.7007462020-06-15 15:04:08 -0.049373 0.026508 ... -1.694419 -1.7032762020-06-15 15:04:10 -0.077828 -0.049373 ... -1.694419 -1.6944192020-06-15 15:04:12 0.064448 -0.077828 ... -1.705807 -1.6944192020-06-15 15:04:14 -0.020918 0.064448 ... -1.710869 -1.705807[5 rows x 36 columns]================================================================================features:[[-0.02091774 0.06444794 -0.07782834 -0.04937258 0.02650799 -0.12525265 -2.06428556 -1.96568848 -2.16288147 -2.08071843 -1.94925692 -2.19574189 0.92939697 0.92939697 -1.07368691 0.92939697 -1.07368691 -1.07368691 -1.41861822 -1.42605902 -1.4294412 -1.42470615 -1.4274119 -1.42470615 -1.05508516 -1.06879043 -1.06879043 -1.0619378 -1.06741991 -1.06741991 -1.70580717 -1.70707253 -1.71339931 -1.7108686 -1.7108686 -1.70580717]]position: 1signal: 12020-06-15 17:04:14.402154================================================================================*** NO TRADE PLACED ****** END OF CYCLE ***2020-06-15 17:04:16.199950================================================================================================================================================================*** GOING NEUTRAL ***{'id': '979', 'time': '2020-06-15T15:04:16.138027118Z', 'userID': 13834683,'accountID': '101-004-13834683-001', 'batchID': '978','requestID': '60721506683906591', 'type': 'ORDER_FILL', 'orderID': '978','instrument': 'EUR_USD', 'units': '-100000.0','gainQuoteHomeConversionFactor': '0.882420762903','lossQuoteHomeConversionFactor': '0.891289313284','price': 1.12751, 'fullVWAP': 1.12751, 'fullPrice': {'type': 'PRICE','bids': [{'price': 1.12751, 'liquidity': '10000000'}],'asks': [{'price': 1.12765, 'liquidity': '10000000'}],'closeoutBid': 1.12751, 'closeoutAsk': 1.12765}, 'reason': 'MARKET_ORDER','pl': '-3.5652', 'financing': '0.0', 'commission': '0.0','guaranteedExecutionFee': '0.0', 'accountBalance': '99259.7485','tradesClosed': [{'tradeID': '975', 'units': '-100000.0','price': 1.12751, 'realizedPL': '-3.5652', 'financing': '0.0','guaranteedExecutionFee': '0.0', 'halfSpreadCost': '6.208'}],'halfSpreadCost': '6.208'}================================================================================ 从“策略监控”本地运行脚本，然后允许实时检索和处理这类信息。当然，可以轻松调整日志记录和流数据以符合个人需求。⁵此外，交易脚本和整体逻辑可以进行调整，以编程方式包括止损或止盈目标等元素。交易货币对和/或差价合约伴随着多种财务风险。为这些工具实施算法交易策略自动导致许多额外的风险。其中包括交易和/或执行逻辑中的缺陷，以及技术风险，包括与套接字通信相关的问题，延迟的检索，甚至在部署过程中丢失 Tick 数据。因此，在以自动化方式部署交易策略之前，应确保已识别、评估和适当处理所有相关的市场、执行、操作、技术和其他风险。本章介绍的代码仅用于技术说明目的。最后一节通过截图提供了逐步概述。虽然前几节基于 FXCM 交易平台，但视觉概述基于 Oanda 交易平台。配置 Oanda 账户第一步是在 Oanda（或其他交易平台）上设置账户，并根据凯利准则设置正确的杠杆比率，如图10-8 所示。图10-8. 在 Oanda 上设置杠杆设置硬件第二步是创建 DigitalOcean droplet，如图10-9 所示。图10-9. DigitalOcean droplet 设置 Python 环境第三步是将所有软件放在 droplet 上（参见图10-10），以建立基础设施。当一切正常运行时，您可以创建一个新的 Jupyter Notebook 并开始交互式 Python 会话（参见图10-11）。图10-10. 安装 Python 和相关包图10-11. 测试 Jupyter Lab 上传代码第四步是上传用于自动化交易和实时监控的 Python 脚本，如图10-12 所示。还需上传包含账户凭证的配置文件。图10-12. 上传 Python 代码文件运行代码第五步是运行 Python 脚本进行自动化交易，如图10-13 所示。图10-14 展示了 Python 脚本启动的交易。图10-13. 运行 Python 脚本图10-14. Python 脚本发起的交易实时监控最后一步是在本地运行监控脚本（假设您在本地脚本中设置了正确的 IP），如图10-15 所示。实际上，这意味着您可以实时在本地监控云实例上正在发生的情况。图10-15. 通过 socket 进行本地实时监控本章介绍了基于机器学习分类算法部署算法交易策略的自动化方式，涵盖了资本管理（基于凯利准则）、基于向量化的性能和风险回测、将离线交易算法转化为在线的方式、适当的部署基础设施以及部署过程中的日志记录和监控。本章主题复杂，需要算法交易从业者广泛的技能集。另一方面，有 RESTful API 可用于算法交易，例如来自 Oanda 的 API，显著简化了自动化任务，因为核心部分主要是利用 Python 封装包 tpqoa 的能力进行 tick 数据检索和订单下达。在这个核心周围，应根据适当和可能性加入减少操作和技术风险的元素。本章引用的论文： Rotando, Louis, and Edward Thorp. 1992. “The Kelly Criterion and the Stock Market.” The American Mathematical Monthly 99 (10): 922-931。 Hung, Jane. 2010. “Betting with the Kelly Criterion.” http://bit.ly/betting_with_kelly 。本节包含本章中使用的 Python 脚本。自动化交易策略下面的 Python 脚本包含了基于机器学习的交易策略的自动化部署代码，正如本章中所讨论和回测的那样： ## Automated ML-Based Trading Strategy for Oanda# Online Algorithm, Logging, Monitoring## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch#import zmqimport tpqoaimport pickleimport numpy as npimport pandas as pdimport datetime as dtlog_file = 'automated_strategy.log'# loads the persisted algorithm objectalgorithm = pickle.load(open('algorithm.pkl', 'rb'))# sets up the socket communication via ZeroMQ (here: "publisher")context = zmq.Context()socket = context.socket(zmq.PUB)# this binds the socket communication to all IP addresses of the machinesocket.bind('tcp://0.0.0.0:5555')# recreating the log filewith open(log_file, 'w') as f: f.write('*** NEW LOG FILE ***\n') f.write(str(dt.datetime.now()) + '\n\n\n')def logger_monitor(message, time=True, sep=True): ''' Custom logger and monitor function. ''' with open(log_file, 'a') as f: t = str(dt.datetime.now()) msg = '' if time: msg += '\n' + t + '\n' if sep: msg += 80 * '=' + '\n' msg += message + '\n\n' # sends the message via the socket socket.send_string(msg) # writes the message to the log file f.write(msg)class MLTrader(tpqoa.tpqoa): def __init__(self, config_file, algorithm): super(MLTrader, self).__init__(config_file) self.model = algorithm['model'] self.mu = algorithm['mu'] self.std = algorithm['std'] self.units = 100000 self.position = 0 self.bar = '2s' self.window = 2 self.lags = 6 self.min_length = self.lags + self.window + 1 self.features = ['return', 'vol', 'mom', 'sma', 'min', 'max'] self.raw_data = pd.DataFrame() def prepare_features(self): self.data['return'] = np.log( self.data['mid'] / self.data['mid'].shift(1)) self.data['vol'] = self.data['return'].rolling(self.window).std() self.data['mom'] = np.sign( self.data['return'].rolling(self.window).mean()) self.data['sma'] = self.data['mid'].rolling(self.window).mean() self.data['min'] = self.data['mid'].rolling(self.window).min() self.data['max'] = self.data['mid'].rolling(self.window).max() self.data.dropna(inplace=True) self.data[self.features] -= self.mu self.data[self.features] /= self.std self.cols = [] for f in self.features: for lag in range(1, self.lags + 1): col = f'{f}_lag_{lag}' self.data[col] = self.data[f].shift(lag) self.cols.append(col) def report_trade(self, pos, order): ''' Prints, logs, and sends trade data. ''' out = '\n\n' + 80 * '=' + '\n' out += '*** GOING {} *** \n'.format(pos) + '\n' out += str(order) + '\n' out += 80 * '=' + '\n' logger_monitor(out) print(out) def on_success(self, time, bid, ask): print(self.ticks, 20 * ' ', end='\r') df = pd.DataFrame({'bid': float(bid), 'ask': float(ask)}, index=[pd.Timestamp(time).tz_localize(None)]) self.raw_data = self.raw_data.append(df) self.data = self.raw_data.resample( self.bar, label='right').last().ffill() self.data = self.data.iloc[:-1] if len(self.data) > self.min_length: logger_monitor('NUMBER OF TICKS: {} | '.format(self.ticks) + 'NUMBER OF BARS: {}'.format(self.min_length)) self.min_length += 1 self.data['mid'] = (self.data['bid'] + self.data['ask']) / 2 self.prepare_features() features = self.data[self.cols].iloc[-1].values.reshape(1, -1) signal = self.model.predict(features)[0] # logs and sends major financial information logger_monitor('MOST RECENT DATA\n' + str(self.data[self.cols].tail()), False) logger_monitor('features:\n' + str(features) + '\n' + 'position: ' + str(self.position) + '\n' + 'signal: ' + str(signal), False) if self.position in [0, -1] and signal == 1: # going long? order = self.create_order(self.stream_instrument, units=(1 - self.position) * self.units, suppress=True, ret=True) self.report_trade('LONG', order) self.position = 1 elif self.position in [0, 1] and signal == -1: # going short? order = self.create_order(self.stream_instrument, units=-(1 + self.position) * self.units, suppress=True, ret=True) self.report_trade('SHORT', order) self.position = -1 else: # no trade logger_monitor('*** NO TRADE PLACED ***') logger_monitor('*** END OF CYCLE ***\n\n', False, False)if __name__ == '__main__': mlt = MLTrader('../pyalgo.cfg', algorithm) mlt.stream_data('EUR_USD', stop=150) order = mlt.create_order(mlt.stream_instrument, units=-mlt.position * mlt.units, suppress=True, ret=True) mlt.position = 0 mlt.report_trade('NEUTRAL', order) 策略监控下面的 Python 脚本包含了远程监控从“自动化交易策略”中执行的 Python 脚本的代码。 ## Automated ML-Based Trading Strategy for Oanda# Strategy Monitoring via Socket Communication## Python for Algorithmic Trading# (c) Dr. Yves J. Hilpisch#import zmq# sets up the socket communication via ZeroMQ (here: "subscriber")context = zmq.Context()socket = context.socket(zmq.SUB)# adjust the IP address to reflect the remote locationsocket.connect('tcp://134.122.70.51:5555')# local IP address used for testing# socket.connect('tcp://0.0.0.0:5555')# configures the socket to retrieve every messagesocket.setsockopt_string(zmq.SUBSCRIBE, '')while True: msg = socket.recv_string() print(msg) ¹ 本文遵循了 Hung (2010) 的阐述。 ² 重要的经验事实是，对于投资和交易表现来说，准确捕捉市场上的最大波动至关重要（即最大的赢家和输家波动）。这一方面在图10-5 中得到了清晰的说明，图示了交易策略准确捕捉基础工具的大幅下跌运动，从而导致交易策略跳跃较大。 ³ 使用链接 http://bit.ly/do_sign_up 注册 DigitalOcean 新账户时，可获得 10 美元的奖金。 ⁴ 这里使用的日志记录方法非常简单，以简单文本文件的形式。可以轻松地更改日志记录和持久化，比如将相关的金融数据以数据库或适当的二进制存储格式（如 HDF5 ）的形式存储（参见第3 章）。 ⁵ 请注意，如同两个脚本中实现的那样，套接字通信未加密，通过网络发送明文，这可能在生产环境中代表着安全风险。谈论便宜。给我看代码。 Linus Torvalds Python 已经成为一种强大的编程语言，并在过去几年中发展出了一个庞大的有用包生态系统。本附录提供了 Python 和所谓的科学或数据科学栈的三个主要支柱的简明概述： NumPy （参见 https://numpy.org ） matplotlib （参见 https://matplotlib.org ） pandas （参见 https://pandas.pydata.org ） NumPy 提供对大型同构数值数据集进行高性能数组操作，而 pandas 主要设计用于高效处理表格数据，例如财务时间序列数据。这样一个简介附录——只涵盖与本书其余内容相关的选定主题——当然不能取代对 Python 及其涵盖的包进行彻底介绍。但是，如果你对 Python 或编程总体上比较新，你可能会得到一个初步的概述，并对 Python 的全貌有所了解。如果你已经在量化金融中使用其他语言（如 Matlab、R、C++ 或 VBA），你会看到 Python 中典型的数据结构、编程范式和习惯用法。要了解 Python 应用于金融的全面概述，请参阅 Hilpisch（2018）。其他更一般的介绍，关注科学和数据分析的语言包括 VanderPlas（2017）和 McKinney（2017）。本节介绍基本的 Python 数据类型和结构、控制结构以及一些 Python 习惯用法。数据类型值得注意的是，Python 通常是一种动态类型系统，这意味着对象的类型是从其上下文中推断出来的。让我们从数字开始： In [1]: a = 3 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [2]: type(a) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[2]: intIn [3]: a.bit_length() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[3]: 2In [4]: b = 5. ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [5]: type(b)Out[5]: float 将变量名 a 分配为整数值 3。查找 a 的类型。查找用于存储整数值的位数。将变量名 b 分配为浮点数值 5.0。 Python 可以处理任意大的整数，这对于数字理论应用非常有益，例如： In [6]: c = 10 ** 100 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [7]: cOut[7]: 100000000000000000000000000000000000000000000000000000000000000000000000 00000000000000000000000000000In [8]: c.bit_length() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[8]: 333 分配一个“巨大”的整数值。显示用于整数表示的位数。对这些对象进行的算术运算按预期进行： In [9]: 3 / 5. ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[9]: 0.6In [10]: a * b ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[10]: 15.0In [11]: a - b ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[11]: -2.0In [12]: b + a ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[12]: 8.0In [13]: a ** b ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[13]: 243.0 除法。乘法。加法。差异。幂。许多常用的数学函数可以在 math 模块中找到，它是 Python 标准库的一部分： In [14]: import math ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [15]: math.log(a) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[15]: 1.0986122886681098In [16]: math.exp(a) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[16]: 20.085536923187668In [17]: math.sin(b) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[17]: -0.9589242746631385 从标准库中导入 math 模块。计算自然对数。计算指数值。计算正弦值。另一个重要的基本数据类型是字符串对象（ str ）： In [18]: s = 'Python for Algorithmic Trading.' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [19]: type(s)Out[19]: strIn [20]: s.lower() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[20]: 'python for algorithmic trading.'In [21]: s.upper() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[21]: 'PYTHON FOR ALGORITHMIC TRADING.'In [22]: s[0:6] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[22]: 'Python' 将 str 对象赋给变量名 s 。将所有字符转换为小写。将所有字符转换为大写。选择前六个字符。这样的对象也可以使用 + 运算符组合。索引值–1 表示字符串的最后一个字符（或一般序列的最后一个元素）： In [23]: st = s[0:6] + s[-9:-1] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [24]: print(st) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) Python Trading 将 str 对象的子集合并为一个新对象。打印出结果。字符串替换经常用于参数化文本输出： In [25]: repl = 'My name is %s, I am %d years old and %4.2f m tall.' ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [26]: print(repl % ('Gordon Gekko', 43, 1.78)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) My name is Gordon Gekko, I am 43 years old and 1.78 m tall.In [27]: repl = 'My name is {:s}, I am {:d} years old and {:4.2f} m tall.' ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [28]: print(repl.format('Gordon Gekko', 43, 1.78)) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) My name is Gordon Gekko, I am 43 years old and 1.78 m tall.In [29]: name, age, height = 'Gordon Gekko', 43, 1.78 ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [30]: print(f'My name is {name:s}, I am {age:d} years old and \ {height:4.2f}m tall.') ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) My name is Gordon Gekko, I am 43 years old and 1.78m tall. 用“旧”方式定义字符串模板。用“旧”方式打印模板并替换值。定义字符串模板的“新”方式。用“新”方式打印模板并替换值。为后续替换定义变量。使用所谓的 f-string 进行字符串替换（Python 3.6 引入）。数据结构 tuple 对象是轻量级数据结构。这些是由逗号分隔的对象组成的不可变集合，可以用括号分隔或不分隔： In [31]: t1 = (a, b, st) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [32]: t1 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[32]: (3, 5.0, 'Python Trading')In [33]: type(t1)Out[33]: tupleIn [34]: t2 = st, b, a ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [35]: t2Out[35]: ('Python Trading', 5.0, 3)In [36]: type(t2)Out[36]: tuple 用括号构造一个 tuple 对象。打印出 str 的表示。用不带括号的方式构造 tuple 对象。嵌套结构也是可能的： In [37]: t = (t1, t2) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [38]: tOut[38]: ((3, 5.0, 'Python Trading'), ('Python Trading', 5.0, 3))In [39]: t[0][2] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[39]: 'Python Trading' 构建一个 tuple 对象，使用两个其他对象。访问第一个对象的第三个元素。 list 对象是可变集合对象，通常通过在方括号中提供逗号分隔的对象集合构建： In [40]: l = [a, b, st] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [41]: lOut[41]: [3, 5.0, 'Python Trading']In [42]: type(l)Out[42]: listIn [43]: l.append(s.split()[3]) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [44]: lOut[44]: [3, 5.0, 'Python Trading', 'Trading.'] 使用方括号生成一个 list 对象。将一个新元素（字符串的最后一个单词）附加到 list 对象。对 list 对象进行排序是典型操作，也可以使用 list 构造函数构建（这里应用于一个 tuple 对象）： In [45]: l = list(('Z', 'Q', 'D', 'J', 'E', 'H', '5.', 'a')) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [46]: lOut[46]: ['Z', 'Q', 'D', 'J', 'E', 'H', '5.', 'a']In [47]: l.sort() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [48]: lOut[48]: ['5.', 'D', 'E', 'H', 'J', 'Q', 'Z', 'a'] 使用一个 tuple 对象创建一个 list 对象。在原地对所有元素进行排序（即更改对象本身）。字典（ dict ）对象是所谓的键值存储，通常用花括号构建： In [49]: d = {'int_obj': a, 'float_obj': b, 'string_obj': st} ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [50]: type(d)Out[50]: dictIn [51]: dOut[51]: {'int_obj': 3, 'float_obj': 5.0, 'string_obj': 'Python Trading'}In [52]: d['float_obj'] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[52]: 5.0In [53]: d['int_obj_long'] = 10 ** 20 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [54]: dOut[54]: {'int_obj': 3, 'float_obj': 5.0, 'string_obj': 'Python Trading', 'int_obj_long': 100000000000000000000}In [55]: d.keys() ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[55]: dict_keys(['int_obj', 'float_obj', 'string_obj', 'int_obj_long'])In [56]: d.values() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[56]: dict_values([3, 5.0, 'Python Trading', 100000000000000000000]) 使用花括号和键值对创建一个 dict 对象。根据键访问值。添加一个新的键值对。选择并显示所有键。选择并显示所有数值。控制结构迭代在编程一般和金融分析特别是很重要。许多 Python 对象都是可迭代的，在许多情况下非常方便。考虑特殊的迭代器对象 range ： In [57]: range(5) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[57]: range(0, 5)In [58]: range(3, 15, 2) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[58]: range(3, 15, 2)In [59]: for i in range(5): ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) print(i ** 2, end=' ') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) 0 1 4 9 16In [60]: for i in range(3, 15, 2): print(i, end=' ') 3 5 7 9 11 13In [61]: l = ['a', 'b', 'c', 'd', 'e']In [62]: for _ in l: ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) print(_) a b c d eIn [63]: s = 'Python Trading'In [64]: for c in s: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) print(c + '|', end='') P|y|t|h|o|n| |T|r|a|d|i|n|g| 给定一个单参数（结束值 + 1）的对象。使用 start 、 end 和 step 参数值创建一个 range 对象。迭代一个 range 对象并打印平方值。使用 start 、 end 和 step 参数迭代一个 range 对象。迭代一个 list 对象。迭代一个 str 对象。 while 循环与其他语言中的类似： In [65]: i = 0 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [66]: while i < 5: ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) print(i ** 0.5, end=' ') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) i += 1 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) 0.0 1.0 1.4142135623730951 1.7320508075688772 2.0 将计数器值设置为 0。只要 i 的值小于 5… …打印 i 的平方根，并且… …增加 i 的值 1。 Python 习语 Python 在许多地方依赖于一些特殊的习语。让我们从一个相当流行的习语开始，即列表推导： In [67]: lc = [i ** 2 for i in range(10)] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [68]: lcOut[68]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]In [69]: type(lc)Out[69]: list 基于列表推导语法创建一个新的 list 对象（括号中的 for 循环）。所谓的 lambda 或匿名函数在许多地方都是有用的助手： In [70]: f = lambda x: math.cos(x) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [71]: f(5) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[71]: 0.2836621854632263In [72]: list(map(lambda x: math.cos(x), range(10))) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[72]: [1.0, 0.5403023058681398, -0.4161468365471424, -0.9899924966004454, -0.6536436208636119, 0.2836621854632263, 0.9601702866503661, 0.7539022543433046, -0.14550003380861354, -0.9111302618846769] 通过 lambda 语法定义一个新的函数 f 。评估值为 5 时的函数 f 。将函数 f 映射到 range 对象的所有元素，并创建一个包含结果的 list 对象，然后将其打印出来。一般而言，使用常规的 Python 函数（而不是 lambda 函数），构建如下： In [73]: def f(x): ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) return math.exp(x) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [74]: f(5)Out[74]: 148.4131591025766In [75]: def f(*args): ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) for arg in args: ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) print(arg) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) return None ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [76]: f(l) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) ['a', 'b', 'c', 'd', 'e'] 常规函数使用 def 语句进行定义。使用 return 语句时，定义执行/评估成功时返回的内容；可以有多个 return 语句（例如，针对不同情况）。 0 允许将多个参数作为可迭代对象传递（例如， list 对象）。遍历参数。对每个参数执行某些操作：在此处是打印。返回某些内容：在此处是 None ；对于有效的 Python 函数来说并非必需。将 list 对象 l 传递给函数 f ，该函数将其解释为参数列表。考虑以下函数定义，根据 if-elif-else 控制结构返回不同的值/字符串： In [77]: import random ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [78]: a = random.randint(0, 1000) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [79]: print(f'Random number is {a}') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) Random number is 188In [80]: def number_decide(number): if a < 10: ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) return "Number is single digit." elif 10 <= a < 100: ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) return "Number is double digit." else: ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) return "Number is triple digit."In [81]: number_decide(a) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[81]: 'Number is triple digit.' 导入 random 模块以生成随机数。生成 0 到 1,000 之间的随机整数。打印所绘制数字的值。检查是否为单个数字，如果为 False … …检查是否为两位数；如果也为 False … …剩下的情况只有三位数。用随机数值 a 调用函数。许多计算金融中的操作涉及大量的数值数据数组。 NumPy 是一个 Python 包，允许高效处理和操作这样的数据结构。虽然 NumPy 功能强大且丰富，但本书的目的是涵盖 NumPy 的基础知识即可。关于 NumPy 的一本不错的在线书籍是从 Python 到 NumPy 。它详细介绍了许多重要的方面，这些方面在下面的章节中被省略了。常规的 ndarray 对象 NumPy 的工作马是 ndarray 类，为 n 维数组对象提供数据结构。例如，可以从 list 对象生成一个 ndarray 对象： In [82]: import numpy as np ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [83]: a = np.array(range(24)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [84]: a ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[84]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])In [85]: b = a.reshape((4, 6)) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [86]: b ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[86]: array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]])In [87]: c = a.reshape((2, 3, 4)) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [88]: c ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[88]: array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])In [89]: b = np.array(b, dtype=np.float) ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [90]: b ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)Out[90]: array([[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [12., 13., 14., 15., 16., 17.], [18., 19., 20., 21., 22., 23.]]) 按约定将 NumPy 导入为 np 。从 range 对象实例化一个 ndarray 对象；例如也可以使用 np.arange 。打印出这些值。将对象重塑为二维对象… …并打印出结果。将对象重塑为三维对象… …并打印出结果。这会改变对象的 dtype 为 np.float 并且… …显示了一组新的（现在是浮点数的）数字。许多 Python 数据结构设计得非常通用。一个例子是可变的 list 对象，可以以多种方式轻松操作（添加和删除元素，存储其他复杂数据结构等）。 NumPy 与常规的 ndarray 对象的策略是提供一种更专业的数据结构，其中所有元素都是同一种原子类型，并且允许在内存中连续存储。这使得 ndarray 对象在解决某些情境下的问题时更加优秀，例如在操作较大或大型数值数据集时。在 NumPy 的情况下，这种专门化还伴随着对程序员的便利性，一方面，另一方面通常会提高速度。向量化操作 NumPy 的一个主要优势是向量化操作： In [91]: 2 * b ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[91]: array([[ 0., 2., 4., 6., 8., 10.], [12., 14., 16., 18., 20., 22.], [24., 26., 28., 30., 32., 34.], [36., 38., 40., 42., 44., 46.]])In [92]: b ** 2 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[92]: array([[ 0., 1., 4., 9., 16., 25.], [ 36., 49., 64., 81., 100., 121.], [144., 169., 196., 225., 256., 289.], [324., 361., 400., 441., 484., 529.]])In [93]: f = lambda x: x ** 2 - 2 * x + 0.5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [94]: f(a) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[94]: array([ 0.5, -0.5, 0.5, 3.5, 8.5, 15.5, 24.5, 35.5, 48.5, 63.5, 80.5, 99.5, 120.5, 143.5, 168.5, 195.5, 224.5, 255.5, 288.5, 323.5, 360.5, 399.5, 440.5, 483.5]) 在一维 ndarray 对象（向量）上实现标量乘法。以向量化的方式计算 b 中每个数字的平方。通过 lambda 构造函数定义函数 f 。使用向量化将 f 应用于 ndarray 对象 a 。在许多情况下，仅感兴趣的是存储在 ndarray 对象中的（少量）数据。 NumPy 支持基本和高级切片以及其他选择功能： In [95]: a[2:6] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[95]: array([2, 3, 4, 5])In [96]: b[2, 4] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[96]: 16.0In [97]: b[1:3, 2:4] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[97]: array([[ 8., 9.], [14., 15.]]) 选择第三到第六个元素。选择第三行和第五行（最后）。从 b 对象中选出中间的正方形。布尔运算布尔运算也在许多地方得到支持： In [98]: b > 10 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[98]: array([[False, False, False, False, False, False], [False, False, False, False, False, True], [ True, True, True, True, True, True], [ True, True, True, True, True, True]])In [99]: b[b > 10] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[99]: array([11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23.]) 哪些数字大于 10？返回所有大于 10 的数字。 ndarray 方法和 NumPy 函数此外， ndarray 对象已经内置了多个（方便的）方法： In [100]: a.sum() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[100]: 276In [101]: b.mean() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[101]: 11.5In [102]: b.mean(axis=0) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[102]: array([ 9., 10., 11., 12., 13., 14.])In [103]: b.mean(axis=1) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[103]: array([ 2.5, 8.5, 14.5, 20.5])In [104]: c.std() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[104]: 6.922186552431729 所有元素的总和。所有元素的均值。沿第一个轴的均值。沿第二个轴的均值。所有元素的标准差。同样， NumPy 包提供了大量所谓的通用函数。它们是通用的，因为它们可以普遍应用于 NumPy 的 ndarray 对象和标准的数值 Python 数据类型。有关详细信息，请参阅通用函数 (ufunc) ： In [105]: np.sum(a) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[105]: 276In [106]: np.mean(b, axis=0) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[106]: array([ 9., 10., 11., 12., 13., 14.])In [107]: np.sin(b).round(2) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[107]: array([[ 0. , 0.84, 0.91, 0.14, -0.76, -0.96], [-0.28, 0.66, 0.99, 0.41, -0.54, -1. ], [-0.54, 0.42, 0.99, 0.65, -0.29, -0.96], [-0.75, 0.15, 0.91, 0.84, -0.01, -0.85]])In [108]: np.sin(4.5) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[108]: -0.977530117665097 所有元素的总和。沿第一个轴的均值。对所有元素取正弦值并保留两位小数。 Python float 对象的正弦值。但是，您应该注意，将 NumPy 通用函数应用于标准的 Python 数据类型通常会带来显著的性能负担： In [109]: %time l = [np.sin(x) for x in range(1000000)] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 1.21 s, sys: 22.9 ms, total: 1.24 s Wall time: 1.24 sIn [110]: %time l = [math.sin(x) for x in range(1000000)] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) CPU times: user 215 ms, sys: 22.9 ms, total: 238 ms Wall time: 239 ms 在 Python float 对象上使用 NumPy 通用函数的列表推导。在 Python float 对象上使用 math 函数的列表推导。使用 NumPy 中的向量化操作对 ndarray 对象进行操作比前述生成 list 对象的两种方法更快。然而，速度优势通常是以更大甚至巨大的内存占用为代价的： In [111]: %time a = np.sin(np.arange(1000000)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) CPU times: user 20.7 ms, sys: 5.32 ms, total: 26 ms Wall time: 24.6 msIn [112]: import sys ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [113]: sys.getsizeof(a) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[113]: 8000096In [114]: a.nbytes ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[114]: 8000000 使用 NumPy 对正弦值进行向量化计算，这通常更快。导入具有许多与系统相关的功能的 sys 模块。显示内存中 a 对象的大小。显示存储在 a 对象中的数据所使用的字节数。向量化有时是编写简洁代码的非常有用的方法，通常也比 Python 代码快得多。但是，请注意向量化可能在与金融相关的许多场景中具有的内存占用。通常，还有替代算法实现可用，这些实现在内存效率上更高，并且通过使用性能库（如 Numba 或 Cython ）甚至可能更快。参见 Hilpisch (2018, 第十章)。 ndarray 创建在这里，我们使用 ndarray 对象构造函数 np.arange() ，它生成一个整数的 ndarray 对象。以下是一个简单的例子： In [115]: ai = np.arange(10) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [116]: ai ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[116]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [117]: ai.dtype ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[117]: dtype('int64')In [118]: af = np.arange(0.5, 9.5, 0.5) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [119]: af ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[119]: array([0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. ])In [120]: af.dtype ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[120]: dtype('float64')In [121]: np.linspace(0, 10, 12) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[121]: array([ 0. , 0.90909091, 1.81818182, 2.72727273, 3.63636364, 4.54545455, 5.45454545, 6.36363636, 7.27272727, 8.18181818, 9.09090909, 10. ]) 通过 np.arange() 构造函数实例化一个 ndarray 对象。打印出数值。结果的 dtype 是 np.int64 。再次使用 arange() ，但这次带有 start 、 end 和 step 参数。打印出数值。结果的 dtype 是 np.float64 。使用 linspace() 构造函数，在 0 到 10 之间均匀分布 11 个间隔，返回一个具有 12 个值的 ndarray 对象。随机数在金融分析中，人们经常需要随机¹ 数字。 NumPy 提供了许多从不同分布中抽样的函数。在量化金融中经常需要的是标准正态分布和泊松分布。相应的函数位于子包 numpy.random 中： In [122]: np.random.standard_normal(10) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[122]: array([-1.06384884, -0.22662171, 1.2615483 , -0.45626608, -1.23231112, -1.51309987, 1.23938439, 0.22411366, -0.84616512, -1.09923136])In [123]: np.random.poisson(0.5, 10) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[123]: array([0, 1, 1, 0, 0, 1, 0, 0, 2, 0])In [124]: np.random.seed(1000) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [125]: data = np.random.standard_normal((5, 100)) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [126]: data[:, :3] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[126]: array([[-0.8044583 , 0.32093155, -0.02548288], [-0.39031935, -0.58069634, 1.94898697], [-1.11573322, -1.34477121, 0.75334374], [ 0.42400699, -1.56680276, 0.76499895], [-1.74866738, -0.06913021, 1.52621653]])In [127]: data.mean() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[127]: -0.02714981205311327In [128]: data.std() ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[128]: 1.0016799134894265In [129]: data = data - data.mean() ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)In [130]: data.mean() ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png)Out[130]: 3.552713678800501e-18In [131]: data = data / data.std() ![10](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/10.png)In [132]: data.std() ![11](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/11.png)Out[132]: 1.0 抽取十个标准正态分布的随机数。抽取十个泊松分布的随机数。固定随机数生成器的种子值以便重复性。生成一个带有随机数的二维 ndarray 对象。打印一小部分数字。所有值的平均值接近于 0 ，但不完全是 0 。标准差接近于 1 ，但不完全是 1 。第一时刻以向量化的方式进行修正。现在的平均值“几乎等于” 0 。第二时刻以向量化的方式进行修正。现在标准差正好是 1 。此时，引入使用 matplotlib 进行绘图是有意义的，在 Python 生态系统中， matplotlib 是主要的绘图工具。我们始终使用另一个库的设置，即 seaborn ，这样可以得到更现代的绘图风格。以下代码生成图 A-1： In [133]: import matplotlib.pyplot as plt ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [134]: plt.style.use('seaborn') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [135]: import matplotlib as mpl ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [136]: mpl.rcParams['savefig.dpi'] = 300 ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) mpl.rcParams['font.family'] = 'serif' ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) %matplotlib inlineIn [137]: data = np.random.standard_normal((5, 100)) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [138]: plt.figure(figsize=(10, 6)) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) plt.plot(data.c*msum()) ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[138]: [<matplotlib.lines.Line2D at 0x7faceaaeed30>] 导入主要的绘图库。设置新的绘图样式默认值。导入顶级模块。将分辨率设置为 300 DPI（用于保存），字体设置为 serif 。生成一个带有随机数的 ndarray 对象。实例化一个新的 figure 对象。首先计算 ndarray 对象所有元素的累积和，然后绘制结果。图 A-1. 使用 matplotlib 绘制折线图在单个 figure 对象中生成多条线图也很容易（见图 A-2）： In [139]: plt.figure(figsize=(10, 6)); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) plt.plot(data.T.c*msum(axis=0), label='line') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) plt.legend(loc=0); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) plt.xlabel('data point') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) plt.ylabel('value'); ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) plt.title('random series'); ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) 实例化一个新的 figure 对象并定义其大小。通过沿第一个轴计算累积和来绘制五条线，并定义一个标签。将图例放置在最佳位置（ loc=0 ）。添加 x 轴的标签。添加 y 轴的标签。为图添加标题。图 A-2. 包含多条线的图其他重要的绘图类型包括直方图和条形图。显示了数据对象的 500 个值的直方图，如图 A-3 所示。在代码中，使用 .flatten() 方法从二维数组生成一维数组： In [140]: plt.figure(figsize=(10, 6)) plt.hist(data.flatten(), bins=30); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 绘制包含 30 个柱（数据组）的直方图。最后，考虑由以下代码生成的条形图图 A-4： In [141]: plt.figure(figsize=(10, 6)) plt.bar(np.arange(1, 12) - 0.25, data[0, :11], width=0.5); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 基于原始数据集的一个小子集绘制条形图。图 A-3. 随机数据的直方图图 A-4. 随机数据的条形图结束对 matplotlib 的介绍，考虑在图 A-5 中显示的样本数据的普通最小二乘（OLS）回归。 NumPy 提供了两个函数 polyfit 和 polyval ，便于基于简单单项式实现 OLS， x, x^{2}, x^{3}, . . ., x^{n} 。举例说明，考虑线性、三次和九次 OLS 回归（见图 A-5）： In [142]: x = np.arange(len(data.c*msum())) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [143]: y = 0.2 * data.c*msum() ** 2 ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [144]: rg1 = np.polyfit(x, y, 1) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [145]: rg3 = np.polyfit(x, y, 3) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [146]: rg9 = np.polyfit(x, y, 9) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [147]: plt.figure(figsize=(10, 6)) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png) plt.plot(x, y, 'r', label='data') ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) plt.plot(x, np.polyval(rg1, x), 'b--', label='linear') ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) plt.plot(x, np.polyval(rg3, x), 'b-.', label='cubic') ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) plt.plot(x, np.polyval(rg9, x), 'b:', label='9th degree') ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png) plt.legend(loc=0); ![9](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/9.png) 创建 x 值的 ndarray 对象。将 y 值定义为 data 对象的累积和。线性回归。三次回归。九次回归。新的 figure 对象。基础数据。可视化的回归结果。添加图例。图 A-5. 线性、三次和九次回归 pandas 是一个能够高效管理和操作时间序列数据及其他表格数据结构的包。它允许在内存中处理甚至是相当大的数据集上执行复杂的数据分析任务。虽然重点在于内存操作，但也有多种用于外存（磁盘）操作的选项。尽管 pandas 提供了多种不同的数据结构，通过强大的类来体现，但最常用的结构是 DataFrame 类，它类似于关系型（SQL）数据库的典型表格，并用于管理例如金融时间序列数据。这是本节的重点。 DataFrame 类在其最基本形式中， DataFrame 对象由索引、列名和表格数据组成。为了更具体地说明，考虑以下示例数据集： In [148]: import pandas as pd ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [149]: np.random.seed(1000) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [150]: raw = np.random.standard_normal((10, 3)).c*msum(axis=0) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [151]: index = pd.date_range('2022-1-1', periods=len(raw), freq='M') ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [152]: columns = ['no1', 'no2', 'no3'] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [153]: df = pd.DataFrame(raw, index=index, columns=columns) ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)In [154]: df ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[154]: no1 no2 no3 2022-01-31 -0.804458 0.320932 -0.025483 2022-02-28 -0.160134 0.020135 0.363992 2022-03-31 -0.267572 -0.459848 0.959027 2022-04-30 -0.732239 0.207433 0.152912 2022-05-31 -1.928309 -0.198527 -0.029466 2022-06-30 -1.825116 -0.336949 0.676227 2022-07-31 -0.553321 -1.323696 0.341391 2022-08-31 -0.652803 -0.916504 1.260779 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688 导入 pandas 包。设置 NumPy 随机数生成器的种子值。使用随机数创建一个 ndarray 对象。定义一个包含一些日期的 DatetimeIndex 对象。定义一个包含列名（标签）的 list 对象。实例化一个 DataFrame 对象。展示新对象的 str （HTML）表示。 DataFrame 对象具有多种基本、高级和便利方法，下面的 Python 代码演示了其中的一些： In [155]: df.head() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[155]: no1 no2 no3 2022-01-31 -0.804458 0.320932 -0.025483 2022-02-28 -0.160134 0.020135 0.363992 2022-03-31 -0.267572 -0.459848 0.959027 2022-04-30 -0.732239 0.207433 0.152912 2022-05-31 -1.928309 -0.198527 -0.029466In [156]: df.tail() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[156]: no1 no2 no3 2022-06-30 -1.825116 -0.336949 0.676227 2022-07-31 -0.553321 -1.323696 0.341391 2022-08-31 -0.652803 -0.916504 1.260779 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688In [157]: df.index ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[157]: DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31', '2022-09-30', '2022-10-31'], dtype='datetime64[ns]', freq='M')In [158]: df.columns ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[158]: Index(['no1', 'no2', 'no3'], dtype='object')In [159]: df.info() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 10 entries, 2022-01-31 to 2022-10-31 Freq: M Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no1 10 non-null float64 1 no2 10 non-null float64 2 no3 10 non-null float64 dtypes: float64(3) memory usage: 320.0 bytesIn [160]: df.describe() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[160]: no1 no2 no3 count 10.000000 10.000000 10.000000 mean -0.798847 -0.227665 0.672067 std 0.607430 0.578071 0.712430 min -1.928309 -1.323696 -0.029466 25% -0.786404 -0.429123 0.200031 50% -0.688317 -0.202406 0.520109 75% -0.393844 0.160609 0.896922 max -0.160134 0.616657 2.310688 展示前五行数据。展示最后五行数据。打印对象的 index 属性。打印对象的 column 属性。展示对象的一些元数据。提供关于数据的选定摘要统计信息。虽然 NumPy 提供了一个专门的多维数组数据结构（通常用于数值数据）， pandas 将专门化推向了一个更高的层次，使用 DataFrame 类处理表格（二维）数据。特别是在处理金融时间序列数据方面， pandas 表现出色，正如后续的例子所示。数值操作 DataFrame 对象与 NumPy 的 ndarray 对象一样，可以轻松进行数值运算。它们在语法上也非常接近： In [161]: print(df * 2) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) no1 no2 no3 2022-01-31 -1.608917 0.641863 -0.050966 2022-02-28 -0.320269 0.040270 0.727983 2022-03-31 -0.535144 -0.919696 1.918054 2022-04-30 -1.464479 0.414866 0.305823 2022-05-31 -3.856618 -0.397054 -0.058932 2022-06-30 -3.650232 -0.673898 1.352453 2022-07-31 -1.106642 -2.647393 0.682782 2022-08-31 -1.305605 -1.833009 2.521557 2022-09-30 -0.681369 1.233314 1.421210 2022-10-31 -1.447664 -0.412568 4.621376In [162]: df.std() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[162]: no1 0.607430 no2 0.578071 no3 0.712430 dtype: float64In [163]: df.mean() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[163]: no1 -0.798847 no2 -0.227665 no3 0.672067 dtype: float64In [164]: df.mean(axis=1) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[164]: 2022-01-31 -0.169670 2022-02-28 0.074664 2022-03-31 0.077202 2022-04-30 -0.123965 2022-05-31 -0.718767 2022-06-30 -0.495280 2022-07-31 -0.511875 2022-08-31 -0.102843 2022-09-30 0.328859 2022-10-31 0.460191 Freq: M, dtype: float64In [165]: np.mean(df) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[165]: no1 -0.798847 no2 -0.227665 no3 0.672067 dtype: float64 所有元素的标量（向量化）乘法。计算按列的标准差… …以及均值。对于 DataFrame 对象，默认是按列进行操作。计算每个索引值的均值（即，按行）。将 NumPy 的函数应用于 DataFrame 对象。数据选择数据可以通过不同的机制查找： In [166]: df['no2'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[166]: 2022-01-31 0.320932 2022-02-28 0.020135 2022-03-31 -0.459848 2022-04-30 0.207433 2022-05-31 -0.198527 2022-06-30 -0.336949 2022-07-31 -1.323696 2022-08-31 -0.916504 2022-09-30 0.616657 2022-10-31 -0.206284 Freq: M, Name: no2, dtype: float64In [167]: df.iloc[0] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[167]: no1 -0.804458 no2 0.320932 no3 -0.025483 Name: 2022-01-31 00:00:00, dtype: float64In [168]: df.iloc[2:4] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[168]: no1 no2 no3 2022-03-31 -0.267572 -0.459848 0.959027 2022-04-30 -0.732239 0.207433 0.152912In [169]: df.iloc[2:4, 1] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[169]: 2022-03-31 -0.459848 2022-04-30 0.207433 Freq: M, Name: no2, dtype: float64In [170]: df.no3.iloc[3:7] ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[170]: 2022-04-30 0.152912 2022-05-31 -0.029466 2022-06-30 0.676227 2022-07-31 0.341391 Freq: M, Name: no3, dtype: float64In [171]: df.loc['2022-3-31'] ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[171]: no1 -0.267572 no2 -0.459848 no3 0.959027 Name: 2022-03-31 00:00:00, dtype: float64In [172]: df.loc['2022-5-31', 'no3'] ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png)Out[172]: -0.02946577492329111In [173]: df['no1'] + 3 * df['no3'] ![8](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/8.png)Out[173]: 2022-01-31 -0.880907 2022-02-28 0.931841 2022-03-31 2.609510 2022-04-30 -0.273505 2022-05-31 -2.016706 2022-06-30 0.203564 2022-07-31 0.470852 2022-08-31 3.129533 2022-09-30 1.791130 2022-10-31 6.208233 Freq: M, dtype: float64 按名称选择列。按索引位置选择一行。按索引位置选择两行。按索引位置从一列中选择两个行值。使用点查找语法选择列。按索引值选择一行。按索引值和列名选择单个数据点。实现向量化的算术操作。布尔运算基于布尔运算进行的数据选择也是 pandas 的优势之一： In [174]: df['no3'] > 0.5 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)Out[174]: 2022-01-31 False 2022-02-28 False 2022-03-31 True 2022-04-30 False 2022-05-31 False 2022-06-30 True 2022-07-31 False 2022-08-31 True 2022-09-30 True 2022-10-31 True Freq: M, Name: no3, dtype: boolIn [175]: df[df['no3'] > 0.5] ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[175]: no1 no2 no3 2022-03-31 -0.267572 -0.459848 0.959027 2022-06-30 -1.825116 -0.336949 0.676227 2022-08-31 -0.652803 -0.916504 1.260779 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688In [176]: df[(df.no3 > 0.5) & (df.no2 > -0.25)] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[176]: no1 no2 no3 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688In [177]: df[df.index > '2022-5-15'] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[177]: no1 no2 no3 2022-05-31 -1.928309 -0.198527 -0.029466 2022-06-30 -1.825116 -0.336949 0.676227 2022-07-31 -0.553321 -1.323696 0.341391 2022-08-31 -0.652803 -0.916504 1.260779 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688In [178]: df.query('no2 > 0.1') ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[178]: no1 no2 no3 2022-01-31 -0.804458 0.320932 -0.025483 2022-04-30 -0.732239 0.207433 0.152912 2022-09-30 -0.340685 0.616657 0.710605In [179]: a = -0.5 ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [180]: df.query('no1 > @a') ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)Out[180]: no1 no2 no3 2022-02-28 -0.160134 0.020135 0.363992 2022-03-31 -0.267572 -0.459848 0.959027 2022-09-30 -0.340685 0.616657 0.710605 列 no3 中大于 0.5 的值是哪些？选择所有满足条件为 True 的行。使用 & （按位 and ）运算符结合两个条件； | 是按位 or 运算符。选择所有索引值大于'2020-5-15' 的行（基于 str 对象排序）。使用 .query() 方法根据条件选择行，条件为 str 对象。使用 pandas 绘图 pandas 与 matplotlib 绘图包集成良好，便于绘制存储在 DataFrame 对象中的数据。一般情况下，一个方法调用就能搞定（见图A-6）： In [181]: df.plot(figsize=(10, 6)); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 将数据绘制为线图（按列）并修正图形大小。图 A-6. 使用 pandas 绘制线图在这种情况下， pandas 负责格式化索引值，如本例中的日期。这仅适用于 DatetimeIndex 。如果日期时间信息仅以 str 对象形式存在，可以使用 DatetimeIndex() 构造函数轻松转换日期时间信息： In [182]: index = ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31', '2022-09-30', '2022-10-31'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [183]: pd.DatetimeIndex(df.index) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)Out[183]: DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31', '2022-09-30', '2022-10-31'], dtype='datetime64[ns]', freq='M') 日期时间索引数据作为 str 对象的 list 对象。从 list 对象生成 DatetimeIndex 对象。通过这种方式也可以生成直方图。在这两种情况下， pandas 负责处理单列并自动生成单行（具有相应的图例条目，请参见图 A-6）并生成三个不同直方图的相应子图（如图 A-7）： In [184]: df.hist(figsize=(10, 6)); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 为每列生成直方图。图表 A-7. 使用 pandas 生成直方图输入输出操作 pandas 的另一个优点是导出和导入不同数据存储格式的数据（也请参阅第三章）。考虑逗号分隔值（CSV）文件的情况： In [185]: df.to_csv('data.csv') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [186]: with open('data.csv') as f: for line in f.readlines(): print(line, end='') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) ,no1,no2,no3 2022-01-31,-0.8044583035248052,0.3209315470898572, ,-0.025482880472072204 2022-02-28,-0.16013447509799061,0.020134874302836725,0.363991673815235 2022-03-31,-0.26757177678888727,-0.4598482010579319,0.9590271758917923 2022-04-30,-0.7322393029842283,0.2074331059300848,0.15291156544935125 2022-05-31,-1.9283091368170622,-0.19852705542997268, ,-0.02946577492329111 2022-06-30,-1.8251162427820806,-0.33694904401573555,0.6762266000356951 2022-07-31,-0.5533209663746153,-1.3236963728130973,0.34139114682415433 2022-08-31,-0.6528026643843922,-0.9165042724715742,1.2607786860286034 2022-09-30,-0.34068465431802875,0.6166567928863607,0.7106048210003031 2022-10-31,-0.7238320652023266,-0.20628417055270565,2.310688189060956In [187]: from_csv = pd.read_csv('data.csv', ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) index_col=0, ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) parse_dates=True) ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [188]: from_csv.head() # ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[188]: no1 no2 no3 2022-01-31 -0.804458 0.320932 -0.025483 2022-02-28 -0.160134 0.020135 0.363992 2022-03-31 -0.267572 -0.459848 0.959027 2022-04-30 -0.732239 0.207433 0.152912 2022-05-31 -1.928309 -0.198527 -0.029466 将数据写入磁盘作为 CSV 文件。打开该文件并逐行打印内容。将存储在 CSV 文件中的数据读入新的 DataFrame 对象。定义第一列为 index 列。索引列中的日期时间信息应转换为 Timestamp 对象。打印新的 DataFrame 对象的前五行。一般情况下，您应将 DataFrame 对象以更高效的二进制格式（如 HDF5 ）存储在磁盘上。在这种情况下， pandas 包装了 PyTables 包的功能。应使用的构造函数是 HDFStore ： In [189]: h5 = pd.HDFStore('data.h5', 'w') ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [190]: h5['df'] = df ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [191]: h5 ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[191]: <class 'pandas.io.pytables.HDFStore'> File path: data.h5In [192]: from_h5 = h5['df'] ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [193]: h5.close() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png)In [194]: from_h5.tail() ![6](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/6.png)Out[194]: no1 no2 no3 2022-06-30 -1.825116 -0.336949 0.676227 2022-07-31 -0.553321 -1.323696 0.341391 2022-08-31 -0.652803 -0.916504 1.260779 2022-09-30 -0.340685 0.616657 0.710605 2022-10-31 -0.723832 -0.206284 2.310688In [195]: !rm data.csv data.h5 ![7](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/7.png) 打开一个 HDFStore 对象。将 DataFrame 对象（数据）写入 HDFStore 。显示数据库文件的结构/内容。将数据读入新的 DataFrame 对象。关闭 HDFStore 对象。显示新的 DataFrame 对象的最后五行。移除 CSV 和 HDF5 文件。在处理金融数据时， pandas 包中提供了有用的数据导入函数（另见第三章）。以下代码使用 pd.read_csv() 函数从远程服务器上的 CSV 文件中读取标准普尔 500 指数和 VIX 波动率指数的历史日数据： In [196]: raw = pd.read_csv('http://hilpisch.com/pyalgo_eikon_eod_data.csv', index_col=0, parse_dates=True).dropna() ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [197]: spx = pd.DataFrame(raw['.SPX']) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [198]: spx.info() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 .SPX 2516 non-null float64 dtypes: float64(1) memory usage: 39.3 KBIn [199]: vix = pd.DataFrame(raw['.VIX']) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)In [200]: vix.info() ![5](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/5.png) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 .VIX 2516 non-null float64 dtypes: float64(1) memory usage: 39.3 KB 导入 pandas 包。从 CSV 文件（来自 Refinitiv Eikon Data API 的数据）中读取标准普尔 500 指数的历史数据。显示结果 DataFrame 对象的元信息。读取波动率指数 VIX 的历史数据。显示结果 DataFrame 对象的元信息。让我们将各自的 Close 列合并成一个单独的 DataFrame 对象。有多种方法可以实现这个目标： In [201]: spxvix = pd.DataFrame(spx).join(vix) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [202]: spxvix.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 .SPX 2516 non-null float64 1 .VIX 2516 non-null float64 dtypes: float64(2) memory usage: 139.0 KBIn [203]: spxvix = pd.merge(spx, vix, left_index=True, # merge on left index right_index=True, # merge on right index ) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [204]: spxvix.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 .SPX 2516 non-null float64 1 .VIX 2516 non-null float64 dtypes: float64(2) memory usage: 139.0 KBIn [205]: spxvix = pd.DataFrame({'SPX': spx['.SPX'], 'VIX': vix['.VIX']}, index=spx.index) ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)In [206]: spxvix.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2516 entries, 2010-01-04 to 2019-12-31 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 SPX 2516 non-null float64 1 VIX 2516 non-null float64 dtypes: float64(2) memory usage: 139.0 KB 使用 join 方法合并相关数据子集。使用 merge 函数进行组合。使用 DataFrame 构造函数与 dict 对象结合使用。将所有组合数据都包含在一个单一对象中使得视觉分析变得直观（见图 A-8）： In [207]: spxvix.plot(figsize=(10, 6), subplots=True); ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) 将两个数据子集绘制到单独的子图中。图 A-8. 标准普尔 500 指数和波动率指数的历史收盘价。 pandas 还允许在整个 DataFrame 对象上进行向量化操作。以下代码以向量化方式同时计算 DataFrame 对象 spxvix 的两列的对数收益率。 shift 方法根据提供的索引值数量（在本例中为一个交易日）来移动数据集： In [208]: rets = np.log(spxvix / spxvix.shift(1)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [209]: rets = rets.dropna() ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [210]: rets.head() ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[210]: SPX VIX Date 2010-01-05 0.003111 -0.035038 2010-01-06 0.000545 -0.009868 2010-01-07 0.003993 -0.005233 2010-01-08 0.002878 -0.050024 2010-01-11 0.001745 -0.032514 以完全向量化的方式计算两个时间序列的对数收益率。删除所有包含 NaN 值（“不是数字”）的行。显示新 DataFrame 对象的前五行。考虑图中的绘图（见图 A-9），显示了 VIX 对标准普尔 500 指数对数收益率的散点图和线性回归。它展示了这两个指数之间的强负相关关系。 In [211]: rg = np.polyfit(rets['SPX'], rets['VIX'], 1) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [212]: rets.plot(kind='scatter', x='SPX', y='VIX', style='.', figsize=(10, 6)) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) plt.plot(rets['SPX'], np.polyval(rg, rets['SPX']), 'r-'); ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) 对两个对数收益率数据集实施线性回归。创建对数收益率的散点图。在现有的散点图中绘制线性回归线。图 A-9. S&P 500 和 VIX 对数收益率的散点图与线性回归线将财务时间序列数据存储在 pandas 的 DataFrame 对象中，可以轻松计算典型的统计数据： In [213]: ret = rets.mean() * 252 ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [214]: retOut[214]: SPX 0.104995 VIX -0.037526 dtype: float64In [215]: vol = rets.std() * math.sqrt(252) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [216]: volOut[216]: SPX 0.147902 VIX 1.229086 dtype: float64In [217]: (ret - 0.01) / vol ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[217]: SPX 0.642279 VIX -0.038667 dtype: float64 计算两个指数的年化平均收益率。计算年化标准差。计算无风险短期利率为 1%的夏普比率。最大回撤，我们仅计算 S&P 500 指数，稍微复杂一些。在其计算中，我们使用 .cummax() 方法，记录到某个日期为止的历史最大值。考虑以下生成图 A-10 的图表的代码： In [218]: plt.figure(figsize=(10, 6)) ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png) spxvix['SPX'].plot(label='S&P 500') ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png) spxvix['SPX'].cummax().plot(label='running maximum') ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png) plt.legend(loc=0); ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png) 实例化一个新的 figure 对象。绘制 S&P 500 指数的历史收盘值。计算并绘制随时间变化的运行最大值。在画布上放置图例。图 A-10. S&P 500 指数的历史收盘价和运行最大值绝对最大回撤是运行最大值与当前指数水平之间的最大差异。在我们的特定情况下，约为 580 个指数点。相对最大回撤有时可能更具有意义。在这里，大约为 20%： In [219]: adrawdown = spxvix['SPX'].cummax() - spxvix['SPX'] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [220]: adrawdown.max()Out[220]: 579.6500000000001In [221]: rdrawdown = ((spxvix['SPX'].cummax() - spxvix['SPX']) / spxvix['SPX'].cummax()) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [222]: rdrawdown.max()Out[222]: 0.1977821376780688 求取绝对最大回撤。求取相对最大回撤。最长回撤期如下计算。以下代码选择所有回撤为零的数据点（即达到新的最大值的地方）。然后计算回撤为零的两个连续指数值（交易日期）之间的差异，并取最大值。在我们分析的数据集中，最长回撤期为 417 天： In [223]: temp = adrawdown[adrawdown == 0] ![1](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/1.png)In [224]: periods_spx = (temp.index[1:].to_pydatetime() - temp.index[:-1].to_pydatetime()) ![2](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/2.png)In [225]: periods_spx[50:60] ![3](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/3.png)Out[225]: array([datetime.timedelta(days=67), datetime.timedelta(days=1), datetime.timedelta(days=1), datetime.timedelta(days=1), datetime.timedelta(days=301), datetime.timedelta(days=3), datetime.timedelta(days=1), datetime.timedelta(days=2), datetime.timedelta(days=12), datetime.timedelta(days=2)], dtype=object)In [226]: max(periods_spx) ![4](https://gitee.com/OpenDocCN/ibooker-quant-zh/raw/master/docs/py-algo-trd/img/4.png)Out[226]: datetime.timedelta(days=417) 选择所有回撤为 0 的索引位置。计算所有这些索引位置之间的 timedelta 值。展示了其中一些值的选择性。选择结果的最大值。本附录提供了有关在算法交易环境中使用 Python、 NumPy 、 matplotlib 和 pandas 的选定主题的简明介绍。当然，它无法取代全面的培训和实践经验，但它可以帮助那些希望快速入门并愿意在必要时深入了解细节的人。本附录涵盖的主题的宝贵、免费资源是 Scipy Lecture Notes ，提供多种电子格式。同样可以免费获得的是 Nicolas Rougier 的在线书籍从 Python 到 NumPy 。本附录中引用的书籍： Hilpisch, Yves. 2018. Python 金融 . 第二版. Sebastopol: O’Reilly. McKinney, Wes. 2017. Python 数据分析 . 第二版. Sebastopol: O’Reilly. VanderPlas, Jake. 2017. Python 数据科学手册 . Sebastopol: O’Reilly. ¹ 请注意，计算机只能生成伪随机数作为真正随机数的近似。Top ArticlesTop 19 Game Sites Not Blocked by School (2024) Baddies West Episode 14: Release Date, Preview & Streaming Guide - OtakuKart Barbra Archives Broken Gphone X Tarkov ‘The Expendables 4’: Cast, Release Date, Filming Details & What to Expect Die Filmstarts-Kritik zu The Expendables 4 Ma Oems Emt Lookup Kansas City Chiefs Super Bowl rally shooting result of 'dispute between several people,' 2 juveniles in custody Vuelo 244 Avianca Hoy Flexibility | Sports Medicine | UC Davis Health MFM Mountain Top Life Devotional 18 October 2019 – We Serve The Unchanging God DCLM DAILY MANNA DEVOTIONAL 29TH JUNE 2024 - DON’T MISS YOUR CHANCE - Daily DevotionalsLatest PostsThe Meaning Behind The Song: Master Jack by Four Jacks and a Jill - Old Time Music Looney Tunes Cartoons Volleyball . BrightestGames.comArticle information Author : Lidia Grady Last Updated : 2024-07-05T03:25:28+07:00 Views : 5235 Rating : 4.4 / 5 (45 voted) Reviews : 84% of readers found this page helpfulAuthor information Name : Lidia Grady Birthday : 1992-01-22 Address : Suite 493 356 Dale Fall, New Wanda, RI 52485 Phone : +29914464387516 Job : Customer Engineer Hobby : Cryptography, Writing, Dowsing, Stand-up comedy, Calligraphy, Web surfing, Ghost hunting Introduction : My name is Lidia Grady, I am a thankful, fine, glamorous, lucky, lively, pleasant, shiny person who loves writing and wants to share my knowledge and understanding with you.NAVIGATION Home DMCA Privacy Policy DISCOVER Terms And Conditions Cookie Agreement Contacts CONTACT US dinhthienvan1@gmail.com About Us Disclaimer Loansatwholesale Loansatwholesale is a website that writes about many topics of interest to you, it's a blog that shares useful knowledge and insights for everyone about Everything. © 2024 Loansatwholesale. All Rights Reserved.adblock We notice you're using an ad blocker Without advertising income, we can't keep making this site awesome for you. I understand and have disabled ad blocking for this site$