When you get to understanding that limits of your local machine are not enough to collect the needed amount of data. The first thing you’re starting to think is to get a proxies and parallel your queries among them.
But the first solution is not always the best. Let's take a look at this problem more precisely.
I don’t want to cover the problems with the cheap proxies like slow connection or presence in the blacklists. I want to focus on features and limitation that you have using proxy or Scraping API.
The most important advantage of the proxy is that it can be easily implemented into you Scraping software, as you can import a proxy list into almost every Scraping program you’re using.
While with Scraping API you’ll need to do some coding on your side to make things work. But the other side is that you’re getting an access to much bigger proxy pool of proxies that are being rotated and checked on a regular basis.
The other point when you should consider using proxy is when you’ll need to perform a sequence of queries with the same, fixed, IP address. For example, when performing a development test that requires for user to keep it’s IP for the whole session of interaction with the website.
So, making a summary:
You should use proxies if:
1) You’re unable to make changes in your Scraping software and forced to used proxies.
2) Need a static IP for a sequence of requests.
You should use Scraping API if:
1) You want to pay less, having higher amount of proxies.
2) You don’t won’t to take care of updating proxy list.
3) You don’t want to handle all the errors by yourself and want to focus on the data.