Predicting Bitcoin Price with AutoML Tables

Applying Artificial Intelligence (AI) frequently require surprising amount of (tedious) manual work. Tools for automation can make AI available to more people and to rapidly solve many more important challenges. This blog posts tests such a tool – AutoML Tables.

Figure 1 – Bitcoin price prediction – is it going up, down or sideways?

This blog post generates a data set from an API and applies automated AI – AutoML Tables for regression to predict numbers – in this case Bitcoin closing price next hour based on data from the current hour.

1. Introduction

AutoML Tables – can used on tabular data (e.g. from databases or spreadsheets) for either: classification (e.g. classify whether it is an Valyrian steel Sword or not – as shown in Figure 2a) or regression (predict a particular number, e.g. reach of Scorpion canon aiming at dragons as shown in the top figure 2b)

Figure 2a- Classification Example – Is Aria’s Needle Sword of Valyrian Steel?
Figure 2b – Regression Example – reach of Euron’s arrow

2. Choosing and Building Dataset

However, I didn’t find data sets that I could use for Valerian steel classification or Scorpion arrow reach regression (let me know if such data exists), but instead found a free API to get bitcoin related data over time instead and since I assume Bitcoin is completely unrelated to Valyrian steel and Scorpion (however, I might be wrong about that given that Valyrian steel furnaces might compete with Bitcoin about energy – perhaps a potential confounding variable to explain a potential relationship between prices of Valyrian swords and Bitcoin?) .

Scientific selection of Bitcoin data API: Since I am not an expert in cryptocurrency I just searched for free bitcoin api (or something in that direction) and found/selected cryptocompare.com

2.1 Python code to fetch and prepare API data

Materials & Methods

I used a colab (colab.research.google.com) to fetch and prepare API data, in combination with AutoML web UI and a few Google Cloud command line commands (gsutil and gcloud methods). Also used Bigquery for storing results and AutoML stored some output related to evaluation in Bigquery

Imports and authentication (Google Cloud)

Method to fetch Bitcoin related trade data

Method to fetch Bitcoin related social & activity data

(code duplication isn’t wrong – or is it? – leave refactoring of this and previous method as an exercise for the reader)

Method to combine the 2 types of API data

Method for fetching and preprocessing data from API

2.2 Python Code to prepare Bitcoin data for BigQuery and AutoML Tables

Actually fetch some results (16000 hours = 1.82 years)

Write as 1 json per line to a file

Set active Google Cloud project (! in front in colab means shell command line command)

Creating a Google Cloud storage bucket to store data

Create a Bigquery schema based on the API data fetched

Note: bigquery-schema-generator was a nice tool, but had to change INTEGER to FLOAT in the generated schema in addition to prepare data (ref perl oneliner)

Generate (or fetch existing) Bigquery data set & create Bigquery Table

Note: I used the project id ‘predicting’, replace with your – ref: bq command further down.

Load API data into (new) Bigquery Table

Check that the table exists and query it

Figure 3 – output from select query towards Bitcoin data in Bigquery

We have input (x) features, but not a feature (y) to predict(!)

Create a column to predict can be done by creating a new column that is time shifted, e.g. for a time t=0 there is a particular row that require a t=1 feature to train – the feature we want to predict is the Bitcoin close price next hour (e.g. not exactly quant/high-frequency trading – but a more soothing once-per-hour experience, if it works out ok it can be automated – for the risk taking?). This can be generated either in Bigquery with select and LEAD() method or with a Python Pandas Dataframe shift – showing both approached underneath.

Prepare final data with NEXTCLOSE column (as csv) for AutoML and copy to Google Cloud bucket

3. AutoML prediction

Now the data is ready for AutoML (Note that the step with Bigquery could have been avoided in this case, but could also be another direction since AutoML can import directly from Bigquery). Underneath you can see an example of a created dataset in AutoML Console.

Figure 4 – AutoML Console – with an example data set named bitcoindata

Creating a new dataset

Figure 5 – create new AutoML Tables dataset

Importing data from Google Cloud Bucket

Figure 6 – Import data to AutoML from Google Cloud Storage
Figure 7 – Importing data

Set target variable (NEXTCLOSE) and look at statistics of features

Figure 8 – select target column and data split (train/validation/testing)
Figure 9 – inspect correlation with target variable

Train Model with AutoML

Figure 10 – select budget for resources to generate model

Look at core metrics regarding accuracy

Figure 11 – Metrics from training (MAE, RMSE,R^2, MAPE) and important features

Deploy and Use the AutoML model – Batch Mode

Figure 12 – Batch based prediction
Figure 13 – online prediction

Conclusion

Have shown an example of using AutoML – the main part was about getting data from an API and preparing to use it (section 2), and using it in AutoML to train a model and look into evaluation. This model aims to predict the next hour bitcoin closing based on data from the current hour, but can probably be extended in several ways – how would you extend it?

Best regards,

Amund Tveit

DISCLAIMER: this blog only represent my PERSONAL opinions and views

You may also like

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.