This week on the blog, we take a closer look at a group of segments in a series called The Skinny on Options Data Science.

The Skinny on Options Data Science episodes feature "Dr. Data" from the tastytrade research team an expert in both statistics and predictive modeling.

If you are seeking to better understand the various methods of forecasting options prices, and the data used in this process, look no further than these compelling sessions!

In the first-ever data science segment, Tom, Tony and Dr. Data examined one of the most prevalent topics in the field - supervised learning models. Supervised learning models use historical information (data) to make predictions about the future.

The process generally employed in creating these models involves finding similar market environments from the past, learning from that history (through the model), and then deploying strategies that leverage the learnings.

Dr. Data noted on that first episode that supervised learning is a synonym for “machine learning” - which is a critical piece of artificial intelligence.

As we are dealing with trading, and not science fiction, supervised learning in this context means regression modeling, not another Terminator sequel.

A regression model simply inputs a group of variables and the output allows one to estimate the (possible) relationships between them.

On the first segment, Dr. Data used historical implied volatility and profit (or loss) as the variables. The goal in using this data for modeling purposes was to understand if there was a discernible relationship between the two (e.g. whether higher implied volatility led to higher profit).

Dr. Data pointed out that a key element in data science is mining data that matches the current situation, and then parsing that data into a more digestible subset.

The graphic below from that same episode illustrates the process of trimming historical data into "training data," the latter being the subset of data actually used in analysis:

A regression (plot) is then run on the data which yields a theoretical forecast variable. In this example, the output variable from the results can be used to help forecast what type of profit (or loss) one could theoretically expect from a specific implied volatility.

The forecasted results can then be compared to the actual historical results and conclusions can be made on the accuracy of the model.

The graphic on the right illustrates that in this example the forecast variable was somewhat successful in matching the actual observed data.

An important takeaway from this session with Dr. Data is that historical data can help with future trading decisions. The example from this particular episode sought to discern whether certain levels of implied volatility translated to profit - certainly a potential discovery that could be leveraged in ongoing trading decisions.

The above emphasizes that using historical data and supervised learning models (like regression) can help traders make more educated decisions, especially as compared to the traditional "gut feel” approach.

Additionally, it’s important to keep in mind that the success of the model relies heavily on the data that it is built on - a topic that is discussed more thoroughly in Dr. Data’s second segment.


To learn more about how data science is used in options trading we encourage you to review all of the episodes in “The Skinny on Options Data Science” series.

Also, if you have any questions for Dr. Data, please leave them below in the comments section.

As always, thank you for reading and being a part of the tastytrade community!