Scraping Crypto Price Data From Polygon.io
Polygon.io
Polygon is a great website for getting price data across crypto, option, forex and the stock market. It has a very simple pricing model that gives you unlimited access to whatever asset classes you signed up for.
Today we'll be using their free plan to get historical candle bars for cryptocurrencies. Although the same techniques that we use in this article will help you to use any of their other REST API endpoints for other asset classes.
To get yourself up and running you'll only need pandas
and requests
installed in a fresh python venv
.
Requesting from the API
I'm going to structure this script around a pull_1m_data
function that will grab 1-minute bars for a certain symbol on a given day. I'll make it easily adjustable for other timeframes.
If you take a look at the polygon.io docs you can see all of the different parameters that we need to feed into the endpoint to get our data.
Here's what my attempt at the function looked like:
1def pull_1m_data(symbol, date):
2 """
3 date is a python date format
4 symbol is of the form XXXUSD
5 """
6
7 polygon_api_key = "rACH0leobCBj1JkpYoEZjveYtXFhxyJj"
8 polygon_rest_baseurl = "https://api.polygon.io/v2/"
9
10 # BTCUSD, UNIUSD, ETHUSD
11 symbol = "X:" + symbol
12
13 multiplier = 5
14 timespan = "minute"
15
16 limit = 40000
17
18 # newest data at the bottom
19 sort = "asc"
20
21 start_time = datetime.combine(date, datetime.min.time())
22 end_time = start_time + timedelta(days = 1)
23
24 start_time = int(start_time.timestamp() * 1000)
25 end_time = int(end_time.timestamp() * 1000) -1
26
27 request_url = f"{polygon_rest_baseurl}aggs/ticker/{symbol}/range/{multiplier}/" +\
28 f"{timespan}/{start_time}/{end_time}?adjusted=true&sort={sort}&" + \
29 f"limit={limit}&apiKey={polygon_api_key}"
30
31 data = requests.get(request_url).json()
32
33 if "results" in data:
34 return data["results"]
35 else:
36 raise Exception("Something went wrong")
Obviously make sure that you replace the API key with your own from the polygon dashboard.
One interesting trick I'm using in there is to use
1start_time = datetime.combine(date, datetime.min.time())
To get a timestamp at midnight for the date that I enter. We do this so that we can extract a unix timestamp from the datetime object. You'll also notice that I take 1 millisecond away from our end_time
this is done to make sure that we don't include the bar starting at midnight the next day. Try it yourself and you'll see what I mean.
Other than that the script is largely a conventional pattern that you'll use over and over again when interacting with REST APIs.
At this point you can go ahead and just call the function, running it through a for loop to get the amount of days that you're after
1day = date(year = 2021, month = 1, day =1)
2
3bars = []
4days_of_data = 2
5
6for i in range(days_of_data):
7 bars += pull_1m_data("BTCUSD", day)
8 day -= timedelta(days = 1)
9 time.sleep(15)
If you have a paid plan, you don't need to include the time.sleep
, that's just to make sure that we don't go over the 5 requests / minute limit for free accounts.
Lastly we can make use of pandas to clean up our data and save down to a CSV
1df = pd.DataFrame(bars)
2df["date"] = pd.to_datetime(df["t"], unit = "ms")
3df = df[["date","o","h","c","l","v"]]
4df.columns = ["time","open","high","low","close","volume"]
5df = df.sort_values("time")
6
7df.to_csv("data.csv", index= False)
8
9print(df)
Video Tutorial
If you'd prefer a video tutorial, you can check out this video from my channel: