Data Journalism Spring 2019

Course Objective

Students in this course will be expected to become familiar with the use of data in order to produce stories with impact, authority and distinction. Data — large sets of information and numbers — are becoming increasingly available from public, private and social media sources. As a result, journalism organizations are looking to make use of this rich pool of information — and to hire those who are able to do so. Fundamentally, you will leave this course with a data mindset — a skill that will help you in journalism and maybe even in life.

Students in this course will be introduced to the concept of data driven journalism. They will explore ways to obtain data, use tools to analyze it and learn how to deploy it in their work. They will be introduced to basic concepts in downloading data, making public record requests for electronic data, using optical character recognition scanning, and hand building data sets. They will learn a basic command of spreadsheets and will be introduced to database and statistical tools and concepts.

Most importantly, students will develop a data mindset. Data analysis is not simply a skill for reporting, it is a way of seeing the world. They will be expected to leave this course with tools they can apply to journalism and to everyday life.

They will also be taught to avoid the many pitfalls of data, a kind of statistical version of Defense Against the Dark Arts. They will learn how to clean data, question its reliability and use it meaningfully. If we are exceptionally ambitious, we will touch upon future areas of study, including the use of statistical analysis, web-scraping and computer coding.

Students in this class must demonstrate flexibility. The field of data journalism is in constant flux. We will be teaching this course in an iterative, design-driven fashion, adapting to students’ needs, levels of competency and emerging news and new technology. Mid-course adjustments are to be expected. Audibles will be called. New directions will be enthusiastically embraced.

The focus will be emphatically on reporting, especially accountability reporting, through data-driven journalism. We are interested in math, statistics, and whiz-bang software only as they improve our ability to report, analyze and tell a take-no-prisoners story. There is no escaping that data-driven journalism is a skill learned by practice, rather than theory.

It is a field where, surprisingly, you must get your hands dirty.

Readings

  • Huff, Darrell. How to Lie with Statistics. New York: Norton, 1993. Print.
  • Meyer, Philip. Precision Journalism: A Reporter's Introduction to Social Science Methods. Lanham, MD: Rowman & Littlefield, 2002. Print

The above two books are simply bibles in the field. There will be selected readings from them, and you may acquire them from Amazon in Kindle or hard copy. In addition, we will be distributing stories that use data in journalism from a variety of sources and genres – including investigative reporting, entertainment reporting and sports reporting. These will be discussed in class, with participation expected and the occasional cold call not out of the realm of possibility.

Requirements

Students will be required to complete exercises involving the use of spreadsheets, databases and other tools both during class and outside of class. We will do a needs-finding at our first class to determine math skills and software knowledge. I would prefer the use of Microsoft’s basic package of Excel. If that’s not possible, we will try to complete our assignments using open source software such as Google Spreadsheet, MySql and Navicat. We will also use tools including OpenRefine, DocumentCloud, the Excel PowerPivot add-on, and Fusion Tables. If we are ambitious, we will play with R, an open-source statistical language, or install a programming environment such as Python or Ruby to get an idea of what coding looks like and how Regular Expressions, or RegExes, can help in data cleaning and scraping.

Other suggestions

A student membership with the group Investigative Reporters & Editors is highly desirable. Membership is $25 per year, and provides access to 30 government databases, free software programs such as Tableau and Cometdocs, and detailed road maps to 30 years of investigative stories. Full disclosure: I’m on the Board of Directors, but get no compensation or kickbacks of any kind. I just think it’s a marvelous opportunity to connect with a marvelous organization.

Drills

Students will be assigned practice drills beginning in week three. These drills are important in your ability to independently know how to query, analyze and clean data. There is no substitute for practice. I do not expect you to produce perfect SQL queries and elegant Excel functions. Failure is an option. What I want to see is your effort to understand and grapple with the exercises. I want you to go out into the wider world confident in your ability to grab data and incorporate into your work. I want you to be able to tell you future employers, with a straight face, that you are a data ninja. Or at least a red belt. Collectively, these drills will contribute 50% of your grade.

Attendance and participation

Journalism is not a passive activity and requires focus, inquisition, and involvement. For that reason, I place great weight on your participation in class, and collaboration with your fellow students. If you’re not talking, you’re not learning. We will be handing out readings and discussing professional work, writings and data issues every week, and I expect your comments, questions and other contributions to our class. None of this can happen if you don’t show up. These factors constitute 25 percent of your final grade.

Paper or Prezzie

Each student will have a choice: complete a written critique of professional work that makes extensive use of government data or they can choose to do a presentation if they attend the annual National Institute for Computer Assisted Reporting conference.The review should be between 500-750 words, the presentation should be three to five minutes. The analysis or presentation will contribute 12.5% of your grade.

Honor Code

Students will be expected to strictly follow the university’s honor code, of course. But this is a journalism class preparing students for careers in journalism. In this particular career, plagiarism results not in student counseling or a lowered grade, but in public embarrassment and likely termination from employment. Suffice to say, the honor code will be strictly enforced.

Goal

I expect that many of these skills will be taught in a hands-on way: Topics may include analysis of healthcare data; consumer safety data; corporate filings; Census statistics and other relevant data sets. The ideal would be to connect a long-term project at a large newsgathering organization with students in order to demonstrate the practical application of the techniques.

What this course is not

What this course is not: This is not a course in the production or visualization of data. You will learn to do some simple graphs and charts to aid your reporting. But this course is designed for the behind-the-scenes work of journalism — the reporting — not the publication side of journalism, the presentation.

It is also not a course in coding. No Ruby or Python or Java. Or, God forbid, Lisp. You will learn some basic concepts of scripting. But you should not expect to be able to build web apps, or web anything, from the skills you will be taught

Instructor