Get Historical Crypto Price Data From Bitstamp in Python

Get Historical Crypto Price Data From Bitstamp in Python

In this post you'll learn how to make a re-usable script to pull historical crypto price data from the Bitstamp exchange for different currency pairs like BTC-USD, ETH-USD, etc. As an example I'll scrape the one-minute bars for the last year for BTC-USD. You don't need an account and it's completely free.

Intro to the Bitstamp API:

If you head on over to the Bitstamp API, you'll see a bunch of different endpoints that we can connect to, and data that we can receive. Visiting the ticker-endpoint will give you the current OHLC (open, high, low, close) data over the last 24 hours for the selected ticker.

In a nutshell, all we're doing here is visiting a specific URL, and interpreting the contents of that page. The URL for the ticker endpoint I just linked to is:

1https://www.bitstamp.net/api/v2/ticker/btcusd/

You could change the btcusd part of that to ethusd and get Ethereum prices instead. That's basically how we let Bitstamp know what kind of data we want. Python can automatically modify the URL for us using the requests package.

We're after historical data, so you'll want to scroll on down to the OHLC data endpoint, which in URL format looks like this:

1https://www.bitstamp.net/api/v2/ohlc/{currency_pair}/

Where you replace {currency_pair} with whatever pair you like. All of the options are given on the endpoint information, but the format is basically the crypto ticker then the currency ticker with no spaces. So btcusd, ethusd, etc.

Request a Single Batch of Candlesticks:

We can use the same trick of visiting the URL in our browser to get the OHLC data. The following will get you the last 5 one-minute candles for BTCUSD

1https://www.bitstamp.net/api/v2/ohlc/btcusd/?step=60&limit=5

Note how the extra parameters step and limit are passed in. Step tells us what time-frame the candles are in, in seconds. Limit tells us how many bars we want to retrieve, with a maximum of 1,000 per request (more on that later). The format of the response will be JSON, which is pretty difficult for us mere mortals to read but computers love it. More details about what values are valid can be found on the documentation.

So how do we get all this in Python? Let's start off by having python visit the URL, rather than us.

 1import json
 2import requests
 3import pandas as pd
 4import datetime
 5
 6currency_pair = "btcusd"
 7url = f"https://www.bitstamp.net/api/v2/ohlc/{currency_pair}/"
 8
 9params = {
10    "step":60,
11    "limit":5,
12  }
13
14data = requests.get(url, params = params)
15data = data.json()
16print(data)

We define our currency pair, then have python automatically insert that into the URL for us using an f-string. After that we create our dictionary of parameters and plug that into requests.get. Make sure that you call .json() on data before printing it out, otherwise you'll just get some nondescript python object.

If all goes well you should get the same printout as when you visited the URL. At this point data is basically a python dictionary, so you can index into it much like any other dictionary.

Performing Multiple Requests:

As we talked about before, you can only get a maximum of 1,000 bars from each request. That might work for you if you just want a years worth of daily data, but for our use case that won't cut it. What we need to do instead is make multiple requests and stitch them all together. So we could request the minute bars for today, then yesterday, etc., then glue them all together and save down to disk.

There are 1,440 minutes in a day, slightly too much for us given our 1,000 bar limit. So I'm going to request data in 6 hour slots just to be safe.

We feed in the times that we want as unix timestamps. We can generate a list of them with the following snippet:

1start = "2021-01-01"
2end = "2021-01-02"
3
4dates = pd.date_range(start, end, freq = "6H")
5dates  = [ int(x.value/10**9) for x in list(dates)]

We set our start and end dates in YYYY-MM-DD format (important!), and use the date_range function from pandas to generate some datetime objects, separated by the appropriate intervals.

We then use a list comprehension to convert those datetime objects into numbers that we can feed into a request to Bitstamp.

Putting it all together:

If we combine what we've just learned above, and wrap the request up in a for loop, we get the following:

 1import json
 2import requests
 3import pandas as pd
 4import datetime
 5
 6
 7currency_pair = "btcusd"
 8url = f"https://www.bitstamp.net/api/v2/ohlc/{currency_pair}/"
 9
10start = "2021-01-01"
11end = "2021-01-02"
12
13dates = pd.date_range(start, end, freq = "6H")
14dates  = [ int(x.value/10**9) for x in list(dates)]
15
16print(dates)
17
18master_data = []
19
20for first, last in zip(dates, dates[1:]):
21    print(first,last)
22
23    params = {
24            "step":60,
25            "limit":1000,
26            "start":first,
27            "end":last,
28            }
29
30    data = requests.get(url, params = params)
31
32
33    data = data.json()["data"]["ohlc"]
34
35    master_data += data

We iterate through the list of dates in pairs, making sure not to miss any intervals. We make a separate request for each time interval and append the results onto our master_data list of results. Make sure that you're indexing into data to extract only the bar data, and not any of the other meta data that is returned to us like the currency pair.

At this point when you print out master_data (after the loop has finished) it should just be a massive list of different dictionaries, with each dictionary representing one candlestick.

We can clean up the formatting of the candles by converting to a Pandas DataFrame, getting rid of duplicate data, and saving down to disk.

 1df = pd.DataFrame(master_data)
 2df = df.drop_duplicates()
 3
 4df["timestamp"] = df["timestamp"].astype(int)
 5df = df.sort_values(by="timestamp")
 6
 7df = df[ df["timestamp"] >= dates[0] ]
 8df = df[ df["timestamp"] < dates[-1] ]
 9
10print(df)
11
12df.to_csv("tutorial.csv", index=False)

I'm also applying a filter here to get rid of candlesticks that fall outside of my chosen date region, as the API sometimes returns data that is outside of the interval. After that you should have a .csv file with your price data, ready for modelling or backtesting.

1high,timestamp,volume,low,close,open
229022.01,1609459200,0.86157958,28999.14,29006.31,28999.63
329086.90,1609459260,14.56195084,29007.31,29083.47,29007.31
429073.02,1609459320,3.03030144,29028.14,29035.89,29069.80
529069.39,1609459380,2.18962055,29019.00,29048.13,29037.68
629057.73,1609459440,1.44653760,29035.61,29045.19,29048.13
7....
8....

Video Tutorial

If you'd prefer a video tutorial, check out this video from our youtube channel: