Some reading material for the course (other information is posted on Blackboard):
Introduction to Analytics Readings:
- DC Water case study
- Coca Cola orange juice write up in in BusinessWeek
- GE Big Data
- Big Data and Cows
- 3-Minute Video from IBM on Optimization
- Read the IBM article in the Analytics Magazine
- Skim this site, the Science of Better
Software Tools:
- Weka data mining tool (This is what we’ll use in class. Go ahead and download it and install)
- Alpine Data Labs for data mining (MSiA uses this, but it doesn’t work well enough on PC’s for us to use it. But, feel free to download it and try it out)
- KNIME— another open source data mining tool. I just recently heard from good things about this product. We probably won’t use in class, but it gives you another tool to test if you are interested.
- Tableau Public (the public version of Tableau– don’t put confidential data up here). I tested this on my blog. We’ll use this later in the class. For training on Tableau, check out the training videos.
- If you are curious about Python– I don’t think I’ll be able to teach this in the class, but a lot of data scientists use it. Here is how download Python 2.7 and get installed and the popular, Learn Python the Hard Way to learn how to program in it.
- If you are curious, here is R and R-Studio.
- If you are good with Excel and want to try some Business Intelligence with it, try PowerPivot. (hat tip, Patricio)
Reading on Data Mining and Machine Learning
- Elements of Statistical Learning. This is a key book used in NU’s MSiA program. It is bit technical. But, at least read the first Chapter– I pulled some examples from here.
- A First Encounter with Machine Learning. Chapters 3 and 4 are worth a read. I pulled from this for my May 2nd lecture.
- Data Mining and Weka- a three part series that shows some data mining with Weka. (Part 1, Part 2, and Part 3). I’ll be using some data sets from this example.
- Top 10 Machine Learning Mistakes.
- Top 10 Machine Learning Algorithms. A bit technical, but a good reference.
- A Programmers Guide to Data Mining. This is where I pulled the recommendation systems- Chapter 2- that we’ll talk about in Week 6 (May 9th)
Reading on Visualization:
- Michael Schrage of Harvard has a nice view on what visualization should do.
- This article, A Tour Through the Visualization Zoo, takes you through many different types of graphs and charts.
- The Functional Art has 3 chapters you can download for free. This gives you some good guidelines for good visualizations.
- These articles really attack pie charts: Andrew Gibson’s blog and an article saying Apple misled with a pie chart. (the Apple article goes on to talk about what to watch out for in Data Science (including a mention of the KNN algorithm).
- It seems odd to me that HBR Blog would publish such a basic article on visualization. But, it tells me that a lot of people present a lot of data with ugly spreadsheets.
- HBR Blog makes up the previous one with this one on visualizations from GE. Also, check out the link for the GE white paper on the industrial internet- for more good visualizations.
- A great video on how visualization can tell a story. (hat tip to Patricio)
- Here is the link to John Snow’s updated map.
- FYI: Top 20 Visualization Tools
- “How Do I Say it With Charts” A somewhat basic quick guide to creating better Excel charts. It is always good to pick up simple tips to make it easier for people to understand our charts.
Big Data Articles
- A definition of Big Data from Gartner
- IBM’s view on Big Data
- SAS’s view on Big Data
- How Big Data will change everything, an article from Georgetown business professor (see page 40)
- Bill Franks, author of The Big Data Tidal Wave, has numerous blog posts on Big Data. Here are few: Big Data is Coming, Driving Analytic Value from New Data, The Global Nature of Big Data, A Strategic Mistake with Big Data.and What is Big Data, Who Cares?
- A Software company was skeptical, but now “loves” Big Data
- A more technical take on Big Data from an optimization expert
- The limits of Big Data
- A post claiming that Big Data is dead.
- Editorial on why data won’t replace thinking
Creating an Analytics Capability Inside an Organization
- Read the Optimization Edge, Part II, Chapters 4, 5, and 6
- Read the Davenport article, Competing on Analytics
- Davenport also has a few blog posts about organizational structure and analytics. Here are a few: Analytics Service Line, a two part article on good structures (Part 1 and Part 2), and a write up of Food Lion.
WSJ Article from May 9, 2013: “Sorry College Grads I Probably Won’t Hire You“
Although not a perfect article, (we need to substitute “analytics” for programming) and you need to think about managing these projects. Here is a key slice of the article:
I don’t mean that you need to become genius programmers, the kind who hack into NASA’s computers for fun. Coding at such a level is a very particular and rare skill, one that most of us—myself included—don’t possess, just as we don’t possess the athletic ability to play for the New York Knicks.
What we nonexperts do possess is the ability to know enough about how these information systems work that we can be useful discussing them with others. Consider this example: Suppose you’re sitting in a meeting with clients, and someone asks you how long a certain digital project is slated to take.
Unless you understand the fundamentals of what engineers and programmers do, unless you’re familiar enough with the principles and machinations of coding to know how the back end of the business works, any answer you give is a guess and therefore probably wrong. Even if your dream job is in marketing or sales or another department seemingly unrelated to programming, I’m not going to hire you unless you can at least understand the basic way my company works. And I’m not alone.
If you want a job in media, technology or a related field, make learning basic computer language your goal this summer. There are plenty of services—some free and others affordable—that will set you on your way.
Usually, the comments on these types of articles are strange. But, from a managers point of view, this one jumped out at me. I’ve found that managers who know the language of analytics and something about databases do a better job at managing. That is what I’m trying to do in this course. Here is the quote:
Have you ever worked for a “manager” who didn’t know how to program? It’s terrible. It’s so terrible I’d recommend new graduates make sure they don’t end up working for a Baby Boomer manager who can’t program or speak intelligently about database management, building reusable code or using modern web tools to manage projects.
Programming is simply part of the modern workplace – there are things people should know how to do regardless of whether they work in HR or in the engineering department. I think the advice is sound – learn to program in 2 languages, and at least dabble a bit in database management. You don’t need to be able to program well, but you should understand what can and cannot be done with programming.
Misc
- Technical article on using Twitter to predict flu outbreaks
- Link to Game Theory.
- WSJ article on how stats is a hot topic
- Links to good data science sources from a company that does data science training