Federal income tax returns are due with the Internal Revenue Service, or IRS. The IRS says about 35 million Americans have waited until the final week to submit their tax returns this year, so lots of us have taxes on the brain right now.
Being taxman isn’t a job for the insecure. In 2013, 40 percent of Americans had an unfavorable view of the IRS. But love ‘em or hate ‘em, the IRS is a data geek’s dream source. By law, huge volumes of tax data are public record and available online. It’s yours to play with; after all, you paid for it.
If you want to dig into some of this data—and analyze and visualize it with BIRT—start with the 2011 ZIP Code Data from the IRS’s Statistics of Income (SOI) division. In a single 93-MB CSV file, you get data from all tax returns filed between January 1 and December 31, 2012 (the most recent completed year), tallied by ZIP Code.
That file contains the usual data that makes up tax returns, such as adjusted gross income (AGI), interest and dividend income, charitable contributions, mortgage interest paid, total tax liability, population, and much more. It’s aggregate data, so you can’t look up how much your next-door neighbor earns; still, it can give you a big-picture view of major forces in the American economy. And the data quality is very high; indeed, the IRS (and 13 other federal statistical agencies) have set demanding professional standards for themselves regarding data quality and scientific integrity.
What are some things you might do with IRS data? Here’s one example: You’re a developer for a brokerage firm, making an application to help your clients interpret their dividends. (At Actuate, we call that a customer-facing application, or CFA.) Your app could compare qualified dividends from the IRS data (an external data source) with that of your client (your internal data), and personalize that information by letting client compare their dividends with those earned by others in their own ZIP Code or state. This personalized analytics capability can be embedded in both web-based and mobile apps.
And that's just one example. BIRT lets you group and aggregate data and create meaningful visualizations, including tables, charts, crosstabs, and maps. So you could create a map showing which states have the most farms or self-employed people, like so:
(Texans filed the most farm tax returns in 2012, according to IRS data. Click the map for a larger view.)
And because BIRT allows for multiple data sources, you could download the ZIP Code dataset from previous years (1998, 2001, 2004, 2005, 2006, 2007, and 2008 are available, and more are coming) and explore how different data points change over time.
BIRT data sources don’t have to be the same type of data; with BIRT you can access data from multiple sources, so you could connect to government economic data (using the Department of Commerce’s Data API) and combine it with IRS tax data to look for correlations.
Who knows—you might discover a hidden connection or trend that can make or save money for you or your company.
I’d love to hear from developers who are using BIRT on projects that use government data. Leave a comment or send email to FSandsmark@actuate.com.
(Coins image from taxcredit.net)