Applying Artificial Intelligence (AI) frequently require surprising amount of (tedious) manual work. Tools for automation can make AI available to more people and to rapidly solve many more important challenges. This blog posts tests such a tool – AutoML Tables.

This blog post generates a data set from an API and applies automated AI – AutoML Tables for regression to predict numbers – in this case Bitcoin closing price next hour based on data from the current hour.
1. Introduction
AutoML Tables – can used on tabular data (e.g. from databases or spreadsheets) for either: classification (e.g. classify whether it is an Valyrian steel Sword or not – as shown in Figure 2a) or regression (predict a particular number, e.g. reach of Scorpion canon aiming at dragons as shown in the top figure 2b)


2. Choosing and Building Dataset
However, I didn’t find data sets that I could use for Valerian steel classification or Scorpion arrow reach regression (let me know if such data exists), but instead found a free API to get bitcoin related data over time instead and since I assume Bitcoin is completely unrelated to Valyrian steel and Scorpion (however, I might be wrong about that given that Valyrian steel furnaces might compete with Bitcoin about energy – perhaps a potential confounding variable to explain a potential relationship between prices of Valyrian swords and Bitcoin?) .
Scientific selection of Bitcoin data API: Since I am not an expert in cryptocurrency I just searched for free bitcoin api (or something in that direction) and found/selected cryptocompare.com
2.1 Python code to fetch and prepare API data
Materials & Methods
I used a colab (colab.research.google.com) to fetch and prepare API data, in combination with AutoML web UI and a few Google Cloud command line commands (gsutil and gcloud methods). Also used Bigquery for storing results and AutoML stored some output related to evaluation in Bigquery
Imports and authentication (Google Cloud)
1 2 3 4 5 6 7 8 9 10 11 12 |
CRYPTOCOMPARE_API_KEY = "" # get your own on cryptocompare.com import requests # for API request to cryptocompare.com from datetime import datetime as dt from google.colab import auth, drive import json # storing Bitcoin API data in bigquery from google.cloud import bigquery from google.api_core.exceptions import BadRequest auth.authenticate_user() # authentication on Google Cloud |
Method to fetch Bitcoin related trade data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
def apicall_prices(crypto_currency="BTC", currency="USD", per_api_call_limit=1, to_timestamp=None, headers={'authorization':'Apikey %s' % (CRYPTOCOMPARE_API_KEY)}): params={'fsym': crypto_currency, 'tsym': currency, 'limit': str(per_api_call_limit)} if to_timestamp is not None: params["toTs"] = to_timestamp print("price api", params) response = requests.get( 'https://min-api.cryptocompare.com/data/histohour', params=params, headers=headers, ) return response |
Method to fetch Bitcoin related social & activity data
(code duplication isn’t wrong – or is it? – leave refactoring of this and previous method as an exercise for the reader)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
def apicall_socialsignals(crypto_currency_coin_id=7605, per_api_call_limit=1, to_timestamp=None, headers={'authorization':'Apikey %s' % (CRYPTOCOMPARE_API_KEY)}): params={'coinId': crypto_currency_coin_id, #coinId=7605 'limit': str(per_api_call_limit)} if to_timestamp is not None: params["toTs"] = to_timestamp print("social api", params) response = requests.get( 'https://min-api.cryptocompare.com/data/social/coin/histo/hour', params=params, headers=headers, ) return response |
Method to combine the 2 types of API data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
def join_pricing_and_social_signals(json_price_response, json_social_signal_response): price_data = json_price_response["Data"] social_signal_data = json_social_signal_response["Data"] joined_on_timestamp = {} for item in price_data: timestamp = item["time"] if not timestamp in joined_on_timestamp: joined_on_timestamp[timestamp] = item #print(joined_on_timestamp.keys()) for item in social_signal_data: timestamp = item["time"] if not timestamp in joined_on_timestamp: print("SHOULDN'T OCCUR!", timestamp) #print(item) else: tmp = joined_on_timestamp[timestamp].copy() formatted_timestamp = dt.utcfromtimestamp(timestamp).strftime("%Y-%m-%d %H:%M:%S") item["formatted_timestamp"] = formatted_timestamp joined_on_timestamp[timestamp] = {**tmp, **item} return joined_on_timestamp |
Method for fetching and preprocessing data from API
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
def fetch_and_preprocess(min_num_results=1, per_api_call_limit=1, crypto_currency="BTC", currency="USD"): num_results = 0 to_timestamp = None json_results = {} while num_results <= min_num_results: print("num results, min_num_results", num_results, min_num_results) price_response = apicall_prices(crypto_currency=crypto_currency, currency=currency, per_api_call_limit=per_api_call_limit, to_timestamp=to_timestamp) social_signal_response = apicall_socialsignals(per_api_call_limit=per_api_call_limit, to_timestamp=to_timestamp) json_price_response = price_response.json() num_items = len(json_price_response["Data"]) num_results += num_items to_timestamp = int(json_price_response['TimeFrom']) formatted_timestamp = dt.utcfromtimestamp(to_timestamp).strftime("%Y/%m/%d %H:%M:%S") json_price_response["formatted_timestamp"] = formatted_timestamp json_social_signal_response = social_signal_response.json() json_social_signal_response["formatted_timestamp"] = formatted_timestamp combined_json = join_pricing_and_social_signals(json_price_response, json_social_signal_response) for key in combined_json: json_results[key] = combined_json[key] return json_results |
2.2 Python Code to prepare Bitcoin data for BigQuery and AutoML Tables
Actually fetch some results (16000 hours = 1.82 years)
1 |
json_results = fetch_and_preprocess(16000, per_api_call_limit=2000) |
Write as 1 json per line to a file
1 2 3 4 |
fh = open("bitcoindata.json","w") for line in sorted(json_results): fh.write(json.dumps(json_results[line]) + "\n") fh.close() |
Set active Google Cloud project (! in front in colab means shell command line command)
1 2 |
project_id="" # the project id you have in google cloud (Web UI) !gcloud config set project {project_id} |
Creating a Google Cloud storage bucket to store data
1 2 |
bucket_name = "" # the google cloud bucket name you want !gsutil mb -c regional -l us-central1 -p {project_id} gs://{bucket_name} |
Create a Bigquery schema based on the API data fetched
Note: bigquery-schema-generator was a nice tool, but had to change INTEGER to FLOAT in the generated schema in addition to prepare data (ref perl oneliner)
1 2 3 4 |
!pip install bigquery-schema-generator !generate-schema bitcoindata.json > bitcoindata.schema !perl -npi -e 's/INTEGER/FLOAT/gi' *.schema !gsutil cp *.schema gs://{bucket_name} |
Generate (or fetch existing) Bigquery data set & create Bigquery Table
Note: I used the project id ‘predicting’, replace with your – ref: bq command further down.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
client = bigquery.Client(project=project_id) # dataset_id = 'bitcoindata' # or whatever you choose dataset_ref = client.dataset(dataset_id) try: client.get_dataset(dataset_ref) except: dataset = bigquery.Dataset(dataset_ref) dataset = client.create_dataset(dataset) print('Dataset {} created.'.format(dataset.dataset_id)) dataset_ref = client.dataset(dataset_id) print(dataset_ref) |
1 |
!bq mk --table --expiration 3600 --description "This is my table" --label predicting:bitcoindata bitcoindata.bctrainingdata bitcoindata.schema |
Load API data into (new) Bigquery Table
1 2 3 4 5 |
!bq load --source_format NEWLINE_DELIMITED_JSON \ --ignore_unknown_values \ bitcoindata.bctrainingdata \ gs://predicting/bitcoindata.json \ bitcoindata.schema |
Check that the table exists and query it
1 |
!bq show predicting:bitcoindata |
1 2 |
%%bigquery --project predicting bitcoindata select * from bitcoindata.bctrainingdata |

We have input (x) features, but not a feature (y) to predict(!)
Create a column to predict can be done by creating a new column that is time shifted, e.g. for a time t=0 there is a particular row that require a t=1 feature to train – the feature we want to predict is the Bitcoin close price next hour (e.g. not exactly quant/high-frequency trading – but a more soothing once-per-hour experience, if it works out ok it can be automated – for the risk taking?). This can be generated either in Bigquery with select and LEAD() method or with a Python Pandas Dataframe shift – showing both approached underneath.
1 2 |
%%bigquery --project predicting bitcoindata select lead(close) OVER (ORDER BY formatted_timestamp) as NEXTCLOSE, * from bitcoindata.bctrainingdata ORDER by formatted_timestamp desc |
1 2 3 |
df = df.assign(NEXTCLOSE=tst.loc[:,'close'].shift(1)) # ADDING TIMESTAMP CAN BE USEFUL FOR DEBUGGING, TO MAKE SURE YOU ARE SHIFTING IN THE RIGHT DIRECTION! df = df.assign(NEXT_TIMESTAMP=tst.loc[:,'formatted_timestamp'].shift(1)) |
Prepare final data with NEXTCLOSE column (as csv) for AutoML and copy to Google Cloud bucket
1 2 3 4 |
csv_data = bitcoindata_w_prediction_column.to_csv(index=False) fh = open("automl_import_data.csv","w") fh.write(data) fh.close() |
1 |
!gsutil cp bq_import_data.csv gs://{bucket_name} |
3. AutoML prediction
Now the data is ready for AutoML (Note that the step with Bigquery could have been avoided in this case, but could also be another direction since AutoML can import directly from Bigquery). Underneath you can see an example of a created dataset in AutoML Console.

Creating a new dataset

Importing data from Google Cloud Bucket


Set target variable (NEXTCLOSE) and look at statistics of features


Train Model with AutoML

Look at core metrics regarding accuracy

Deploy and Use the AutoML model – Batch Mode


Conclusion
Have shown an example of using AutoML – the main part was about getting data from an API and preparing to use it (section 2), and using it in AutoML to train a model and look into evaluation. This model aims to predict the next hour bitcoin closing based on data from the current hour, but can probably be extended in several ways – how would you extend it?
Best regards,
DISCLAIMER: this blog only represent my PERSONAL opinions and views