Submitted by orangeboxtech on Sun, 07/17/2011 - 19:58
I'm not a fan of the word "Web Scraping". It invokes ideas of web crawlers created my people who spend their life automating the gathering and selling of millions of emails so they can sell it to spammers who want to sell you all sort of "enhancement" products.
The reality web scraping has all sorts of real functionality for real businesses. One of my current tasks was to scrape a company's forum, as well as their competitors forum, and then run the unstructured text data through an analytics engine to find text associations for different product offerings. The main question is what is it about our product that users perceive as different/better/worse than the competitions product. Some people would have you believe that it's as simple as just reading the forum posts...yeah right. Go ahead, do that leg-work. Read the hundred and hundred of posts, and then when you present your data to the marketing department, let them no it's because you believe a,b, and c.
The reality is with a good analytics process you can actually give numbers that the clients cares about, and that they can act on. This work appears in 5% of your documents where people are complaining, and there's a 70% chance that when this word is listed, people are going to reference returning the product. Now that's actionable. That has meaning. The reality of the situation though is this isn't data that people just willingly give to you. Services like Radian 6, Meltwater, and Trackur are fantastic, but they're expensive, and not necessarily targeted, they do everything. Specific questions sometimes need specific answers, and scraping and text analytics can handle this.
See Vancouverdata, a great blog that covers the basics of using RapidMiner as a scraping and text analytics tool. Or, you can always contact me.
Submitted by orangeboxtech on Tue, 05/18/2010 - 16:38
Business Intelligence is complicated.
IT people think that they have all of the answers for all of the problems, and that answer is "data". Just give people the ability to drill through whatever it is that they want, the ability to pick whatever they want out of the giant metric bin of values, and they'll solve problems, right?
In practice, it's never that easy. The IT department seems to forget at time that while they might deal with the data, they aren't the end user, and while they understand the warehouse, they don't use it to make directional decisions. The response of some is just to throw it all out there, and have your user base sort through it, gleaning the pearls of wisdom that may or may not be there...don't worry about the fact that accountants, finance department, managers, strategic planners, and executives have never used the tools before, and might not be technically inclined; that's on them. Sometime the myopia of the data managers leads to poorly implemented systems.
Even a simple data warehouse has hundreds of columns of data, and it’s not uncommon for more complex systems to have thousands of columns. When an end user is faced with a blank canvas, thousands of columns of data, and hundreds of accessible features, complexity is automatic. “Where do I begin?” is often the first question, shortly followed by “I don't have time for this,” or “I give up.”
It's important to remember that while we might be quick to trumpet the value of in depth drill maps, predictive algorithms, and complex data mining tools, it's the end user that uses the data on a daily basis to make the strategic decisions that a company requires. As a data warehouse and business intelligence developer, it's not just our job to build systems, it's also our job to take the time to ask the right questions. What works for the technically advanced, might be useless for the 90% of people who are not, and have neither the time, nor the drive to learn systems that might be our life blood.
In the 60's, Theodore Levitt wrote the industry altering article Marketing Myopia where he criticized the marketing professionals of the time for their narrow understanding of the industries that they were in. Levitt concludes, "...the organization must think of itself not as producing good or services, but as buying customers, as doing the things that will make people want to do business with it." As BI professionals the end user is our customer, let's make sure in addition to focusing on the warehouse life-cycle, and data freshness, we take the time to make sure that our user base wants to do business with us. We achieve that by starting the conversation:
Submitted by orangeboxtech on Fri, 04/16/2010 - 16:07
It's annoying, this idea that the only way that you can produce a worthwhile report is pull four different data sources (your sales volume, your customer data, your revenue, and your internet hits) into one file so that you can create the one report that shows you those series of metrics that are the most valuable to you...
...and it takes six hours.
There's usually an easier way, and it revolves around Data-warehousing. I'll avoid getting into the specifics of the reporting right now, that's the easy part. Data-warehousing is a planned extracting, transforming, and loading of data (called ETL) from your various data sources, and adding them into one relational database, where the you can then join the data sets on common elements. What an ETL process does is populate all your common elements among all your different sources into one list, and then attach those normalized lists to whatever transaction you might be looking at.
You want to know if your customer "Acme Widgets" has placed any orders on the web site and what their past sales have been? No problem...the warehouse has already taken care of the heavy lifting. All you need to do put your reporting tool on top of it and make your fancy looking reports or dashboards.
In short, a well put together Data-warehouse is the beginning of the backbone of any good business analytic system.
Submitted by orangeboxtech on Wed, 03/31/2010 - 23:00
Small Business doesn't always have the time and capital to develop and utilize the more expensive enterprise reporting solutions. This is where Orange Box Technologies can help. We find ways to use your existing tools, free open source solutions, and inexpensive software to automate and report on the data that is important to you, in a timely and relevant way.