Here is a list of web scrapping/crawling apps/portals:
Free Plan and 18.99 per month
An open-source visual scraping tool that lets you scrape the web without coding, built by Scrapy creators.
Scrapinghub is the most advanced platform for deploying and running web crawlers (also known as “spiders”). It allows your organization to build crawlers easily, deploy them instantly and scale them on demand, without having to manage servers, backups or cron jobs. Everything is stored in our highly available database and retrievable from our API.
Scraper API handles proxies, browsers, and CAPTCHAs for you, so you can scrape any web page with a simple API call.
http://80legs.com/ – FREE
80legs offers powerful web crawling. Extract data from web pages, images, and any other online content. Start crawling websites now faster, easier, and with unlimited reach.
https://dexi.io/ – 119 usd per month
EXTRACT: With our web data extraction and robotic process automation (RPA) tool (web scraping tool) you can extract and transform data from any source.
ENRICH: Use the visual data pipe tool to normalize, transform and enrich data and build an engine for handling all your data sources
CONNECT: Connect data from any data source to any destination with a few clicks. It is that easy!
About Dexi:
Based in Copenhagen, Denmark, dexi.io is a Cloud-based, high-performance, feature-rich application with visual editors for building custom data extraction and refinement solutions. We are trusted by thousands around the world to extract, enrich, and connect the valuable data our clients depend on.
https://mydataprovider.com/runner
mydataprovider.com is a web-based platform that allows customers to extract data from the web. Using our API you can access data from your applications. Service has different ways of data export : csv,xml,excel,json, direct import to database.
Link.fish lets you get web data with the ease of bookmarking. Just specify a url and the app automatically parses content and displays it in intuitive lists to help you get the specific data you need.
Main features of ScrapeHero:
– No Software, No Programming, No DIY tools
– Crawl complex websites with ease
– Never worry about data quality
– Perform complex data transformations
– Real-time Website Scraping API for Price Monitoring
ScrapeHero provides reliable web scraping and web crawling services for enterprises. It has has the processes and the technology scalability to handle web scraping and crawling tasks that are complex and massive in scale – think millions of pages an hour scale. ScrapeHero helps companies to extract structured data and get it delivered to their applications or databases without writing a singleline of code OR configuring a DIY tool and running it.
ScrapeHero provides a full service. It takes care of everything from setting up scrapers, running it, cleaning the data, data quality checks etc and makes sure the data is received by the users.
ScrapeHero data service integrates with FTP, Dropbox, Amazon S3, Google Drive, Box, Google Storage and many other services.
http://www.contentgrabber.com/
Web-scraping is the process of extracting data from websites and storing that data in a structured, easy-to-use format. The value of a web-scraping tool like Content Grabber is that you can easily specify and collect large amounts of source data that may be very dynamic (data that changes very frequently).
Usually, data available on the Internet has little or no structure and is only viewable with a web browser. Elements such as text, images, video, and sound are built into a web page so that they are presentable in a web browser. It can be very tedious to manually capture and separate this data, and can require many hours of effort to complete. With Content Grabber, you can automate this processand capture website data in a fraction of the time that it would take using other methods.
Web-scraping software interacts with websites in the same way as you do when using your web browser. However, in addition to displaying the data in a browser on your screen, web-scraping software saves the data from the web page to a local file or database
Webhose.io is an advanced DaaS (Data as a Service) platform.
We provide data-driven companies with instant access to structured data from news sites, blogs and online forums in over 240 languages worldwide.
Among our clients are some of the biggest names in the fields of brand monitoring, media listening & analytics.
The idea behind Webhose.io is that when you need data from the web, you don’t necessarily have to build a crawler or use a scraper. Webhose.io has already done the heavy lifting for you.
We’ve developed a technology that enables us to collect web data quickly and efficiently.
The efficiency allows us to offer the data we collect at a fraction of the cost of running a crawling operation in-house.
We offer access to both historical & newly-generated web data, which is instantly available in a structured form & can be consumed via an API or a firehose.
Our data is high quality spam-free coming from millions of reliable sources. Feel free to try it now with our free plan (up to 1,000 requests per month).
Kantu is the most popular open-source web macro recorder. If there’s an activity you have to do repeatedly, just create a web macro for it. The next time you need to do it, the entire macro will run at the click of a button and do the work for you.
Kantu’s main focus is ease of use, good recording and reliable playback for all kinds of browser automation projects. It is a record & replay tool for automated testing and a “swiss army knife” for general web automation, automating file uploads and autofill form filling.
The new versions of Kantu for Chrome and Kantu for Firefox bring a major update – they can now run automated visual UI tests inside the web browser, powered by WebAssembly. This makes Kantu one of the first (or even the first?) browser extension that uses WebAssembly.
http://www.outwit.com/products/hub/
http://www.eliteproxyswitcher.com/ – 😉
Web Robots have several offers and tools:
– For users without programming skills. A Chrome extension which guesses where is listing type data on a web page and coverts this data into CSV or Excel file.
– For users with Javascript programming skills. Another Chrome extension which is an Integrated Development Environment to write and execute scraper robots. This allows running robots with all features on user’s computer for free.
– For companies. Web Robots can provide a fullymanaged data scraping service or license access to the whole platform where client can create and schedule robots, run them on cloud.
Why Diffbot?
We’re focused exclusively on getting you better web data.
Some of the reasons hundreds of customers make (hundreds of) millions of calls every month:
#The Web’s Best Content Extractor:
Diffbot works automatically—without rules or training. There’s no better way to extract data from web pages. See how Diffbot stacks up to other content extraction methods:
Feature Comparison Text-Extraction Quality Shootout
#Identify Pages Automatically:
Use the Analyze API to automatically find and extract all products, articles, discussions or images while crawling any site.
Analyze API
#Detailed product data:
The Product API automatically returns complete product info, including all pricing data, product IDs, brand and full specifications tables.
Product API
#Clean text and html:
Articles, discussion threads, product descriptions and image captions are returned in pure text and sanitized HTML.
Start testing today
#Structured Search:
Search structured content from any crawl on-the-fly using our Search API, returning only the matching results.
Plus…
¤ All APIs execute Javascript so content is parsed like a regular browser.
¤ Works on most non-English pages thanks to visual processing.
¤ Date normalization: Datestamps are normalized and presented in RFC 1123 (HTTP/1.1) standard format.
¤ Multipage articles are automatically joined together in a single API response.
¤ Entity extraction: automatic tagging identifies major topics and entities within article text.
¤ Fix any issues realtime with the API Toolkit.
¤ Bulk API allows the extraction of hundreds to hundreds-of-thousands of pages.
¤ Access Crawlbot and Bulk job data in full JSON or CSV formats.
¤ Optionally crawl using a diverse array of IP addresses.
http://community.screen-scraper.com
http://www.ubotstudio.com/index7
Turn web page content into structured data all without coding.
*Important* – Mozenda uses a Windows application that must be installed on Windows Vista or newer
Power your intelligence business decision with real time data. Web scraping solution for SMBs and Enterprises in cloud, Leverage the structured data from on-demand and scheduled scraper to fuel data to your business.
Kantu is the most popular open-source web macro recorder. If there’s an activity you have to do repeatedly, just create a web macro for it. The next time you need to do it, the entire macro will run at the click of a button and do the work for you.
Kantu’s main focus is ease of use, good recording and reliable playback for all kinds of browser automation projects. It is a record & replay tool for automated testing and a “swiss army knife” for general web automation, automating file uploads and autofill form filling.
The new versions of Kantu for Chrome and Kantu for Firefox bring a major update – they can now run automated visual UI tests inside the web browser, powered by WebAssembly. This makes Kantu one of the first (or even the first?) browser extension that uses WebAssembly.
Scrapy is an open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
Extracty can extract any web data and create an API to the webpage’s information.
ParseHub is a web scraping tool built to handle the modern web.
You can extract data from anywhere. ParseHub works with single-page apps, multi-page apps and just about any other modern web technology.
ParseHub can handle Javascript, AJAX, cookies, sessions and redirects. You can easily fill in forms, loop through dropdowns, login to websites, click on interactive maps and even deal with infinite scrolling.
Diggernaut is a cloud based service for web scraping, data extraction and other ETL tasks. Imagine spending hours a day manually collecting data from websites you need. It’s very cumbersome and time consuming. With Diggernaut, you can speed up the data collection process a thousand times and save time to do more important tasks. Our tiny diggers can do web scraping on your behalf and get data from websites for you. Just leave it up to Diggernaut to get your job done.
https://www.uipath.com/community
Robotic Process Automation Software.
Automate rule based business processes.
Train and design robots that drive the UI like a human.
Robotic Process Automation is an automation performed by a computer (software robot) to drive existing application software in the same way a user does.
UiPath enables business analysts to automate rule based business processes, train and design robots that drive the UI like a human.
A free, fully-featured, and extensible tool for automating any web or desktop application. UiPath Studio Community is free for individual developers, small professional teams, education and training purposes UiPath enables organizations to configure software robots that automate manual, repetitive rules-based tasks at a fraction of the cost of their human equivalent, and integrate without disruption the legacy system.
Desktop Automation
UiPath Studio introduces a visual, declarative way of describing how to automate a process, and you can use it in the same way you use a Visio diagram. When working with the presentation layer of other apps, you simply indicate on the screen what operation you need to perform. UiPath understands the UI at the logical control level and does not rely on the position of elements on screen. This makes automation much more reliable and independent of screen-size and resolution.
Most advanced Screen Scraping Technology
UiPath has pioneered the screen scraping of running desktop apps with 100% accuracy in under 16 milliseconds. Prior to UiPath, screen scraping had a low accuracy rate and was slow and fragile.
UiPath features an innovative technique for extracting text from running apps, even if they are hidden or covered by another app.
Web scraping is a premier feature of the screen-scraping landscape, as there are dedicated methods for extracting pattern-based data that span multiple web pages.
Octoparse is a modern visual web data extraction software. Both experienced and inexperienced users would find it easy to use Octoparse to bulk extract information from websites, for most of scraping tasks no coding needed. Users can extract data from 98% of open websites using our tools. Octoparse with its point-and-click interface, makes web-scraping very easy to learn and understand. Use the data extracted to power your business intelligence, build up your customer database.
http://apifier.com (favs)
Apify is the easiest way to run headless Chrome jobs in the cloud. It comes with an advanced web crawler that enables the scraping of even the largest websites. Schedule your jobs using a cron-like service and store large amounts of data in specialized storages.
Reuse crawlers and acts built by others and publish your own for other people to use. Your source code can be hosted on GitHub, Docker Hub, an arbitrary URL or directly on Apify. The users of your service pay for the resources consumed, not you!
Your Apify services can be written in JavaScript or any other language as long as they are bundled as Docker containers. Manage the platform using a web interface or REST API. Trigger external services using webhooks, or directly from your code or with integration platforms like Zapier or Keboola.
Octoparse is a modern visual web data extraction software. Both experienced and inexperienced users would find it easy to use Octoparse to bulk extract information from websites, for most of scraping tasks no coding needed. Users can extract data from 98% of open websites using our tools. Octoparse with its point-and-click interface, makes web-scraping very easy to learn and understand. Use the data extracted to power your business intelligence, build up your customer database.
USE CASES
Some examples how this can be utilized:
– Scrape eCommerce sites that sell your products, to check for price violations and review data.
– Build a broad crawler covering thousands of sites to automatically discover contact and profiles information for a specific industry.
– Parse all shop locations for a number of big brands to provide a locator for users looking for a specific type of shop.
– Build a database of interesting candidates to hire, by matching various sources of internet profiles with a series of filters which you or the HR team are interested in. I know people building boutique businesses on basic web scraping… like someone who uses our platform to offer a service that allows people to monitor Amazon Kindle Books pricing, and get alerted when the price drops or the book goes on sale. In effect, bringing Amazon’s data “back to the people” to allow them to make better choices.
What are your thoughts on residential proxy providers like Smartproxy? https://www.smartproxy.com/
I’m thnking of scraping eCommerce websites like Amazon but I’m not even sure where to begin with, would be great if you wrote an article about that, thanks!