Octoparse is a cloud-based web data extraction solution that helps users extract relevant information from various types of websites. It enables users from a variety of industries to scrape unstructured data and save it in different formats including Excel, plain text and HTML.
Users can click an element on a web page to select the type of data to extract. Octoparse allows users to run multiple extraction tasks simultaneously. Tasks can be scheduled to run at regular intervals, or they can be run in real time. Users can also scrape product comments and reviews and social media channels to collect information on consumer sentiment.
Octoprase's Wizard mode provides users step-by-step instructions to extract data, while the Advanced mode provides advanced features for more complicated web pages.
Octoparse offers services on a monthly subscription basis that includes support via email and through an online knowledge base.
Andy J. Branche: Gewerbeimmobilien Mitarbeiteranzahl: 501-1.000 Mitarbeiter
The software is much easier to use, visually appealing, and on going customer support as well as tutorials have been created with the user in mind. Octoparse Web Scraper
Experience: I have been looking professional web scraper for about two months now. I did try so many software's. Some was hidden mist! Most did not work at all. Then I did end up to get Octoparse web scraper! Wau! That cloud base software was exactly what I was looking for! This software really works. Software works even with some of the complex website. I definitely recommend! I use Octoparse on a daily basis and at my organization. there is no smoother way of web scraping! The software has never given me any issues. I think nobody can find better software to scrape data from web. Software It works exactly as expected. Octoparse is easy to use interface no experience scrapping websites is needed - but can do a lot. Octoparse software It has enabled me to ingest a large number of data point and focus my time on statistical analysis vs. data collection. It has safe me some much time! Same jobs would take me hours before and now data is collected in few minutes! When I need a quick way to grab structured web data, Octoparse software will be my first choice.
easy to use.
It could be much cheaper, but it is good software!
It took me about a day to look into all available web scrapers. At the end stopped on Octoparse for couple reasons.
- Installs on Windows, so I could use spare Windows Server for scraping. No nodejs learning or programming needed.
- GUI was simple to understand, can dump a list of links that need to be scraped, select content on the page that needs to go into Excel spreadsheet and click start. That's it, no need to select specific HTML divs or write regex code. Don't know how, but this was the only scraper that could analyze and grab a specific text on the page without setting any rules, the other scrapers I've tried had a hard time and had to make complicated rules.
- You can export to Excel, directly to SQL, MYSQ or Oracle database, CSV, TXT or HTML file.
- You can also back up your scraped data to Octoparse as a backup, will be saved with your task.
- Configuration and scraper apps run in different programs. If one suddenly would to shut down because of some error, other Octoparse tasks would still continue to work as nothing has happened.
- Had a hard time adding a list of 50000 links into the queue, but not a problem because you can have multiple tasks 30-40K links in my case, just divide links between those tasks.
- Did not say anywhere that it was saving the tasks to their servers, so that's why probably has trouble with large tasks. On the other hand, this one is also a Pro, because you can create tasks on your computer and load them up on your server just by restarting the app.
You can have 2 active tasks running at the same time for free, if you want more, you can upgrade to a paid version. It takes about a second to open a page, so roughly you can scrape one page per second per task.
Overall this worked better than great. Did not have to ask our devs to write a scraper, the time I spent creating the scraper would be the same amount of time I would spend discussing with our devs how to scrape the content. And now devs are asking me for stats on scraped data, not the other way around.
If you do any marketing and wish to gather data for stats or just create your database from any website, super easy to do, recommend it.
Masita dwi mandini M. Branche: Forschung Mitarbeiteranzahl: 13-50 Mitarbeiter
The trial version likes to create a trap for their user. They send a notification on the expired trial version on 7 days before and 2 minute before. For me is crazy to get 2 minute before notification. So if you kind a person how likes to forget the expired time of your trial version don't ever try this app, not worthty.
They have some template on crawling specific website
I try to crawl some literature databased from google scholar using their template, but it doesn't work well since google detected it as a robot. So totally crawl all the data was imposible, the system will stopped before finished the entired data crawling.
Howard H. Branche: Forschung Mitarbeiteranzahl: Selbstständig
After searching and searching for a data-service. I couldn't quite fine one that would meet my needs. Then came Octoparse. At first I was hesitant to use. However after watching several tutorials. I could tell the Octoparse team spent alot of time making it easy to use. Octoparse is an extremely powerful tool that has optimized and pushed our data scraping efforts to the next level. I would recommend this service to anyone. The price for the value provides a large return on the investment. For the free version, which works great, you can run at least 10 tasks at a time. However, these tasks are ran simultaneously in the background of the computer application. In my opinion buying any of the plans that allow you to use the cloud interface is very helpful and provides a good bit of flexibity. You can close out of the application and know that Octoparse is running on a server somewhere. My favorite feature is the multi-export options that are provided (csv, microsoft excel old & new, TXT, html). Octoparse doesn't require you to have any knowledge of coding which is helpful. However, knowing simple x-code shortcuts makes Octoparse even more powerful because you can create your own custom script that will run based on your need. The walkthrough tutorials when you first start I think is what really changes the game for Octoparse.
Every once in a while. I have issues with Octoparse shutting down or troubles exporting. However, Octoparse support is super quick and reliable. The bill payment is very trustworthy as well. Octoparse also offers proxy capabilities, and the ability to turn off images to speed up the process. Another addition in the cloud settings is the ability to denote and categorize tasks based on its status. I know this may seem like a no-brainer feature. However it is super helpful to know the status or time of when multiple tasks will be completed so I can plan the extraction. I have heard great things about the API capabilities, however I haven't had time to figure it out. Next on my list. All in all Octoparse offers a wide range of features and I'm happy to have used it.
The amazing and never-ending features
Watch all tutorials
When we were looking for an appropriate scraping software we tried every scraping program available on the market. After some days of testing it turned out that Octoparse is exactly that what we were looking for. This great and powerful tool completely outclasses competition in most of hard tasks:
-it can load and pass through very complex and big websites
-you can set more complex logic which is very useful even if rarely used
-there are practically no websites that Octoparse can't load, doesn't matter on what base and system they were built
-friendly environment, with easy to use GUI
-great support with honest, and kind people.
-and everything without a single line of script which is great!
There are, of course, some cons too, because nothing is perfect and that's a normal thing.
Sometimes program hangs on some sites. After some observations, it came out that it is not a website problem. It happens on sites that are built similar to for example Wikipedia, which construction is practically the same under every link. Probably it is not the API in workflow for designer too, because it is about simple: loop by list of URL -> extract data. Especially when there are more links. Sometimes it shows that site is still loading, sometimes just hangs on one opened page for hours to then just move on. I note that I used all program options, including advanced options too including timeout limit, reload web page and every other option. Nothing has helped.
Finally, functionalities and enhancements that can be added in the future to Octoparse, to further improve it: Example situation: We add a 100 links to the program and 17 of them failed to execute. We found out that reload website option is unclear. If we e.g. use proxy and the connection will fail with it, we can now use other proxy or few proxies depending on settings. It will be great if i.e. one proxy will fail at connection in specified number of attempts (let's say 5-15 attempts) then Octoparse will delete this proxy or mark it as malfunctioning and avoid it in the future, still returning these 17 bad link in scores. And it will be even better if score list after export will have links that failed to execute, just mark them in a separate column, or add website status error code or something. This can help in retry to get data from the websites that failed to load.