Introducing the second edition of Python for Finance Cookbook
What led me to write a second edition and what you can expect from reading it
Almost 3 years after the publication of my first book — Python for Finance Cookbook — the second edition was published. To be perfectly honest, after the publication of the first edition I have not expected to ever write another book. But in the end, I changed my mind and here we are.
In this short article, I will briefly explain why I changed my mind and I will tell you a bit about what you can expect from reading the second edition.
The motivation for the second edition
As I have mentioned, I was not really optimistic about writing another book. The reason for that was actually quite simple — writing a book takes a lot of time and effort. That, together with a full-time job and quite a lot of different side projects (articles, small freelance jobs, etc.) does not leave a lot of spare time. However, a combination of different factors led me to a change of heart.
The first and most important one was the amount of feedback I received from the readers of the first edition. I was contacted in various places (LinkedIn, Twitter, GitHub, under Medium posts, etc.), sometimes with a quick question, but most of the time just with some kind words about the readers’ experience with my book. I must say it is incredibly rewarding to read such messages and know that your book helped someone improve their skills/knowledge and that they found it interesting to read.
Secondly, when I was considering starting the second edition, we were still in the middle of the COVID pandemic. This meant that I was still mostly working from home (lack of commute) and I did not have that many opportunities for travel and other activities. Hence, it seemed like a good moment to take on such a challenging and time-consuming project.
And lastly, even though I managed to cover quite a lot of content in the first edition, I still had an extensive backlog of ideas I wanted to include in the book but I simply did not have enough time to do so. And even after the book was published, a lot of new techniques/approaches/libraries were developed. Over time, I just kept on expanding the backlog with new ideas and I intended to come back to them and potentially write articles about some of them. But as it turned out, I received an opportunity to put them into a book instead.
Taking all three of those factors into account, I decided to give it a shot and once again start working on a book project.
What’s different this time?
For the first edition of the book, I started from scratch by drafting the table of contents and trying to cover quite a lot of different topics. While planning the second edition, together with my publisher, we approached it a bit differently. Instead of going with the gut feeling, we tried to gather as much feedback as possible and hear from the community what they would like to read about in the second edition. To do so, we combined the feedback coming from multiple sources:
- the messages I have received since the publication of the first edition,
- a few surveys that were sent out to the readers of the first edition and people interested in reading a book on such a topic,
- 1-on-1 chats with some readers who wanted to share their opinion more in-depth,
- expert opinion from established authors in the field.
Using all of those inputs, I tried to craft an improved table of contents, which not only added new topics that I found interesting but also focused on the areas that the readers struggled with most. I can share that the answers that popped up the most concerned gathering good quality data, preprocessing it, analyzing it, and implementing algorithmic trading strategies.
The first three definitely make a lot of sense and I tried to expand upon those topics. The last one was definitely the hardest to incorporate. That was because of multiple reasons, for example, the content would then greatly depend on the broker of choice and not every reader would be able to use those due to potential geographical/legal requirements.
I will try to quickly summarise the relationship and differences between the two editions:
- Most of the recipes from the first edition made it to the second one, but all of them were significantly expanded or modified. In a way, no recipe was left unmodified.
- For all of the recipes, I tried to provide extensive reference lists that will allow the readers to dive deeper into the covered topics that they find interesting.
- While writing the second edition, I tried to focus on leveraging existing Python libraries instead of trying to do something ourselves using “lower-level” libraries. A good example of that is the deep learning chapter. In the first edition, I showed how to code a few neural network architectures used for time series forecasting (RNN, LSTM, GRU) in PyTorch. This resulted in quite lengthy recipes explaining the basics of working with PyTorch. In the second edition, I used dedicated Python libraries, which already contain those and many more advanced NN architectures already implemented (either in PyTorch or TensorFlow) and are ready for use in a few lines of code. This way, I could focus more on explaining how to use those models to gain some insights instead of diving deep into the nitty gritty details of coding those algorithms ourselves.
- Some of the recipes did not hold that well to the test of time. For example, in the first edition, I showed how to use
ipywidgets
to create interactive dashboards in a Jupyter Notebook. Nowadays, the go-to solution for such a task would be a Streamlit app, which is what I presented in the second edition. - Connected to the previous two points, while some of the recipes were replaced, this does definitely not mean that they are worthless and not relevant anymore. It all depends on your background, experience, and what you would like to learn to solve the tasks that you might be facing. Hence, the second edition is not fully a replacement for the first one, but more of an expansion and a supplementary read.
Altogether, the second edition was extended by quite a lot compared to the first one. While page count alone is definitely not an indication of quality, the second edition is almost twice as long as the first one (740 vs 432 pages).
Overview of the contents
In this section, I will briefly describe the contents of the book. Each chapter contains multiple recipes, grouped together by a common theme. In most cases, the recipes are self-contained and standalone, however, there are some exceptions to that.
Chapter 1, Acquiring Financial Data, covers a few of the most popular sources of high-quality financial data, including Yahoo Finance, Nasdaq Data Link (formally Quandl), Intrinio, and Alpha Vantage. It focuses on leveraging dedicated Python libraries and processing the downloaded data (often available in various shapes and file formats) for further analysis.
Chapter 2, Data Preprocessing, describes the crucial steps between obtaining the data and using it for building ML models or investigating trading strategies. As such, it covers topics such as converting prices to returns, adjusting them for inflation, imputing missing values, or aggregating trade data into various kinds of bars.
Chapter 3, Visualizing Financial Time Series, focuses on visualizing time series data, with a special emphasis on financial data. By plotting the time series, we can visually identify certain patterns, such as trends, seasonality, or changepoints. The insights gathered at this point can lead to making better decisions while choosing the modeling approach.
Chapter 4, Exploring Financial Time Series Data, shows how to use various algorithms and statistical tests to automatically identify potential issues with time series data, such as the existence of outliers. Additionally, it covers analyzing data for the existence of trends or other patterns such as mean-reversion. Lastly, it explores the stylized facts of asset returns. Together, those concepts are crucial while working with financial data, as we want to make sure that the models/strategies we are building can accurately capture the dynamics of asset returns.
Chapter 5, Technical Analysis and Building Interactive Dashboards, explains the basics of technical analysis in Python by showing how to calculate some of the most popular indicators and automatically recognize patterns in candlestick data. It also demonstrates how to create and deploy a Streamlit-based web app, used to visualize and inspect the predefined TA indicators in an interactive fashion.
Chapter 6, Time Series Analysis and Forecasting, introduces the basics of time series modeling. It starts by looking into the building blocks of time series and how to separate them using various decomposition methods (classic decomposition, STL, etc.) Then, it covers the concept of stationarity, how to test for it, and how to achieve it in case the original series is not stationary. Lastly, it shows how to use two of the most widely used statistical approaches to time series modeling — the exponential smoothing methods (ETS) and ARIMA class models. The chapter explains not only how to manually choose the hyperparameters of those models but also how to use automated approaches that find those values for us.
Chapter 7, Machine Learning-Based Approaches to Time Series Forecasting, starts by covering a very important topic, that is, validating time series models. Then, it provides an overview of feature engineering approaches. It also introduces a tool for automatic feature extraction which generates hundreds or thousands of features with a few lines of code. Furthermore, the chapter explains the concept of reduced regression and how to use Meta’s popular Prophet algorithm. It concludes with an introduction to one of the popular AutoML frameworks for time series forecasting.
Chapter 8, Multi-Factor Models, covers estimating various factor models, starting with the simplest one-factor model (CAPM) and then extending it to the more advanced three-, four-, and five-factor models. It also shows how to test the stability of such models by estimating them in a rolling fashion.
Chapter 9, Modeling Volatility with GARCH Class Models, focuses on volatility and the concept of conditional heteroskedasticity. It shows how to use univariate and multivariate GARCH models, which are one of the most popular ways of modeling and forecasting volatility. As there are no well-established libraries for estimating multivariate GARCH models in Python, we show how to leverage renowned R packages for that task by connecting Python and R sessions.
Chapter 10, Monte Carlo Simulations in Finance, explains how to use Monte Carlo methods for various tasks, such as simulating stock prices, pricing derivatives with no closed-form solution (American/ Exotic options), or estimating the uncertainty of a portfolio by calculating its Value-at-Risk and Expected Shortfall.
Chapter 11, Asset Allocation, starts by explaining the most basic asset allocation strategy (the 1/n portfolio), and on its basis, showing how to evaluate the performance of portfolios. Then it shows three different approaches to obtaining the efficient frontier. Lastly, we explore Hierarchical Risk Parity, which is a novel approach to asset allocation based on the combination of graph theory and machine learning.
Chapter 12, Backtesting Trading Strategies, presents how to run backtests of various trading strategies using two approaches (vectorized and event-driven) with the help of popular Python libraries. To do so, we use a few examples of strategies built on the basis of popular technical indicators or utilizing mean-variance portfolio optimization.
Chapter 13, Applied Machine Learning: Identifying Credit Default, shows how to approach a real-life machine learning task of predicting loan defaults. It covers the entire scope of a machine learning project, from gathering and cleaning data to building and tuning a classifier. As such, this chapter is in fact a sequence of related recipes, rather than a collection of standalone ones. An important takeaway from this chapter is understanding the general approach to ML projects, which can then be applied to many different tasks, be it churn prediction, customer segmentation, or estimating the price of new real estate on the market.
Chapter 14, Advanced Concepts for Machine Learning Projects, continues from the workflow introduced in the preceding chapter and demonstrates possible extensions to the MVP stage of ML projects. It starts with presenting more advanced classifiers. Then, it covers alternative approaches to encoding categorical features and describes a few methods of dealing with imbalanced data. Furthermore, it shows how to create stacked ensembles of ML models and leverage Bayesian hyperparameter tuning to improve upon exhaustive grid search. It also explores various approaches to calculating feature importance and using it to select the most informative predictors. Lastly, it touches upon the rapidly developing field of explainable AI (XAI).
Chapter 15, Deep Learning in Finance, describes how to apply some of the recently developed neural network architectures to two possible use cases in the financial domain. For a classification task (predicting credit card default) we leveraged fastai’s Tabular learner and Google’s TabNet. Then, we applied Amazon’s DeepAR (together with its extension, DeepVAR) and NeuralProphet for a time series forecasting task.
Wrapping up
Having said all that, I must say that I am happy that I have started the project of writing the second edition. Just as before, I have learned a lot from the process and it helped me grow both as a professional and as a person.
Just as with the first edition, writing a book is a huge undertaking and it always takes some kind of a toll on the author and their close ones. That is why I am even happier that the book is now published and I can finally enjoy more free time to pursue other challenges and simply rest. I would also like to thank my family, friends, colleagues and the team at Packt for all the support and words of encouragement I have received on the way. It made a huge difference!
If you are interested in reading the book, you can find it on Amazon (disclaimer: referral link) or on Packt’s website. As always, I would greatly appreciate any feedback and suggestions. You can reach out to me on Twitter, LinkedIn, using the book’s GitHub repository, or here in the comments.