Services

Clients

Expertise

Why Implex?

Scraping solution for J.D.H.

Non-standard R&D project: automated solution for CAPTCHA service

Implex built a custom software to scrape data from the internet to support client’s marketing efforts. Web scraping is a technique used to extract data from websites. However, some websites are well-protected and can make scraping more difficult. This case study will present the solution we've developed to overcome common problems that can be encountered while scraping more demanding websites. These issues include protection against scraping from IPs of public clouds and VPNs, the requirement of a CAPTCHA, rate limiting per IP address, and region-specific restrictions.

Industry:

Recruitment and consulting

Region:

USA

Client since:

2023

Our client operates as a consulting agency, specializing in healthcare consulting and recruitment. They offer a premium level of customer service, focusing on permanent placements for professionals such as Physicians, Dentists, Advanced Practitioners, and Executive Leadership roles.

8-12 MINUTES

is needed to scrape file with 100 records

SERVERLESS ARCHITECTURE

which is cost savings

100% CCPA COMPLIANT

and target any city in USA and Canada

Services and expertise

Top-Tier Expert Engagement

Web Development

Offshore Software Development

Challenge

The client came to us with a need to scrape public data from the online resource to enhance their marketing activities.

The list of individuals, whose data should be enriched, is conveniently organized within a Google spreadsheet file. The service is checking “human” traits for each time, when it is accessed, using CAPTCHA. For such websites, getting the data manually is not as challenging as getting to it with your bot.

The typical scraping problems has become the main challenges of the project:

Region-specific restrictions and VPN blockade: the website is available only from within US and Canada
Bot detection by:
- Rate limiting per IP
- Captcha protection (hCaptcha type)
- Request headers analysis
- Residential IPs only
Cost-effectiveness

Jason Reavis


Chief Operating Officer / Partner at J.D. Hawkins & Associates

“

Implex is a great partner. The program was well-built and does not require continual maintenance. They did a great job managing the project and providing a summary of each meeting. They managed their tasks well, delivered work on time, and promptly responded to our needs. Their professionalism and ability to deliver on their promises stood out.

Solution

There are several techniques and external services which allow you to tackle each problem. We were using two CAPTCHA services, which helped us to be identified as humans and get the data. The data were put to a spreadsheet in a usable format.

The customer is using the google spreadsheet file, conveniently filling the data. The data is uploaded in the agreed format. The scraping process is as easy as clicking one button inside the spreadsheet. Technically, the button triggers AWS Lambda through Apps Script. Lambda schedules an ECS task which scrapes the data and sends an email with the result to everyone listed in the input spreadsheet. Currently there is no DB connected.

As long as CAPTCHA solving solutions are quite slow and unstable we drew separate attention to the redundancy and fallback mechanisms to significantly improve the final success rate of the scraping.

https://d1by0hsgj87y7x.cloudfront.net/use-cases/J.D.Hawkins/picturesDesktopUrl/CasesJDHawkins-Inner1-1280px.png

The tools and technologies we used were:

Results

We automated the process of searching for specific individuals on a website.

As a result, Implex has built a custom data scraping solution for a healthcare consulting company.

The solution aimed to support the client's marketing efforts. We accelerated the data enrichment process by tens times, which was previously done manually.

The project addressed concerns related to the site's security, particularly in terms of web scrapers and bots. Web scraping, while enhancing data collection, also reduced the manual workload.

Key results:

Automation of manual work, allowing one employee to save about 25 hours per month, that said about 125 hours/mo for 5 people
The ability for the entire team to use the tool collectively
Avoided unnecessary expenses and created a system compatible with the client's preferred tools, such as Google Sheets
This is a self-managed system that operates without supervision, provides error notifications in a language the client understands, and allows them to fix those errors
Cost-effectiveness: the system paid for itself within 3-4 months

More Case Studies

We take great pride in the work we do and the values we uphold. Here are some of our best case studies

Building a Scalable Front-End Architecture for a High-Growth Trading Platform

We re-engineered the front-end (FE) architecture and authentication system, making the platform of a leading solutions provider for active options traders more scalable and user-friendly while enabling the seamless display of complex option data. These enhancements boosted customer acquisition, increased mobile engagement, and opened new revenue streams through API sales and feature expansion.

View Case Study

Transforming Project Portfolio Management: A Journey of Innovation and Implementation

Stepping into the role of Project Consulting team, our goal was clear: to transform project portfolio management. By analyzing existing portfolios, fostering stakeholder collaboration, and aligning projects with business objectives, we set the stage for success and unified vision. Introducing standardized processes and efficient tools, we streamlined project initiation, prioritization, and resource management. Join us as we share our journey of innovation and implementation.

View Case Study

Telemetry service for data-driven decisions & product growth

Implementation of the new Telemetry service, which should have a significant impact on Percona's business decision process. By collecting data on how the company’s product and services were being used by clients, Percona is able to identify and prioritize new features based on user needs and improve the product development flow. This lightweight, scalable service was implemented with high quality, allowing the company to meet its product release schedule and manage its software more efficiently.

View Case Study

(R)evolution of the high-load visual art web encyclopedia

The aim of the project was to make art accessible to anyone, anywhere. Today, WikiArt features over 250,000 artworks by 5,000 artists, localized to 8 languages. These artworks are displayed in museums, universities and town halls of more than 100 countries, yet most of it is not on public view. The client claims that they are planning to cover the entire art history of the Earth, from cave artworks to modern private collections.

View Case Study

Turning unusual vision into a novel map-based web product

CyberQuantic's web application transforms a mind map with cases into an innovative map-based website with intuitive navigation. Mental maps are the primary content and navigation method with an easy-to-read visualization. The web app is integrated with CyberQuantic's knowledge base and maintains a high level of website performance. The site includes data on 600+ AI firms in Europe and 200+ open APIs, and excellent Google PageSpeed Insights ratings.

View Case Study

Admit.me: trustful cooperation in education domain

Admit.me is a free virtual admissions coach that offers step-by-step guides through the admissions process of the best schools. The developed solution allows for generating personalized admissions programs based on the applicant's background and goals. Admit.me provides tools and lessons designed to optimize the applicant's admissions chances by framing the applicant's thinking and informing critical admissions decisions.

View Case Study

Rapid MVP for the on-line mobile and web car auction

A B2B startup for car dealers who sell used vehicles across the USA via the marketplace. The developed application has a whole range of functions to support the workflows of car dealers and inspectors, like auctions workflow, live and proxy bidding, post-auction counter offering, and deal closing. The solution is integrated with different APIs to gather information about vehicles, helping dealers make the right decisions at auctions

View Case Study

Lightning-fast website for a reputable cybersec company

A cybersecurity services company Berezha Security Group rebranded as BSG in 2020. So they needed to improve their website, but redesign was only part of the story. The ultimate goal was to make the website fast and SEO-friendly, keeping its Google PageSpeed Insights scores in the green 90-100 range, what's impossible using a WordPress-like CMS approach. A static website with perfect usability, external CMS and one-click deployment functionality, and revision control successfully represents the BSG brand now.

View Case Study

Every journey starts from the first step

Leave your contact details, and we will reach you within 24 hours