State of the art scraping technologies to scout the Instagram programmatically
Even thought it’s not technically illegal to scrape, Instagram still tries to prevent this behavior, by rate-limiting and blocking IP addresses. There are platform as a service solutions, that can tackle this (1,2). They run multiple instances of headless browsers to be able to render modern SPA applications and use multiple proxies to prevent blocking and detection of actual scrapers.
The downside is that they are pretty pricey even for the development purposes. For that reason I will try simulating their product using selenium
and http-request-randomizer
python libraries or their equivalents in a different language. It possible that the Instagram is going to reject requests from publicly known IP address lists. Buying access to a private one should be cheaper than PaaS
mentioned above.
By using proxies we could host the code on cloud platform developed by the company I work in called Zerops for free.