Scraping as a Service | Orange Box Technologies
-->

Scraping as a Service

I'm not a fan of the word "Web Scraping". It invokes ideas of web crawlers created my people who spend their life automating the gathering and selling of millions of emails so they can sell it to spammers who want to sell you all sort of "enhancement" products.

The reality web scraping has all sorts of real functionality for real businesses. One of my current tasks was to scrape a company's forum, as well as their competitors forum, and then run the unstructured text data through an analytics engine to find text associations for different product offerings. The main question is what is it about our product that users perceive as different/better/worse than the competitions product. Some people would have you believe that it's as simple as just reading the forum posts...yeah right. Go ahead, do that leg-work. Read the hundred and hundred of posts, and then when you present your data to the marketing department, let them no it's because you believe a,b, and c.

The reality is with a good analytics process you can actually give numbers that the clients cares about, and that they can act on. This work appears in 5% of your documents where people are complaining, and there's a 70% chance that when this word is listed, people are going to reference returning the product. Now that's actionable. That has meaning. The reality of the situation though is this isn't data that people just willingly give to you. Services like Radian 6, Meltwater, and Trackur are fantastic, but they're expensive, and not necessarily targeted, they do everything. Specific questions sometimes need specific answers, and scraping and text analytics can handle this.

See Vancouverdata, a great blog that covers the basics of using RapidMiner as a scraping and text analytics tool. Or, you can always contact me.