Getting the Digital Goods

Clive Thompson tells me in the April 2011 edition of Wired that government should open its data catalogs in order to foster private-sector investment, businesses, creativity. He tells me a story where BrightScope builds a multi-million dollar business by getting the Department of Labor to “[cough] up the digital goods in bulk.” He then goes on to suggest that if private-sector companies lead the way, perhaps activists will get the goods, too.

I’d like to meet Clive Thompson. I’d like to tell him about the technical assistance work I was doing in Chicago in 1987 for local community organizations in Chicago with digital data on occupations, released on magnetic tape. What happened when the Illinois Department of Employment Security realized that we charged a small fee to these organizations to run and help interpret the data? They yanked the digital version—for everyone—and only disseminated hard copies.

So, I was there first, Clive. And I don’t say this only because I’m feeling left out. I’m saying this because I’m at the Association of Public Data Users (“PublicData”) conference at GWU in Washington DC, bemoaning the drop in funding for the federal statistical agencies, while the O’Reilly Strata Conference is happening in New York celebrating the era of Big Data, including the energetic Open Government data movement (“OpenGov”). And while it would seem that OpenGov might have the same agenda as PublicData, it doesn’t. And that makes me crabby.

See, the difference is that OpenGov isn’t necessarily interested in statistical data. Statistical data, the raison d’être of PublicData, are gathered in order to understand characteristics of the US population and economy (think the US Census), and mostly by federal statistical agencies (think Census Bureau; Bureau of Labor Statistics). Statistical data have been used for decades for drawing electoral districts, setting public policy and programming, disbursing federal funding, and planning infrastructure investments like highways. (Do I see an OpenGov Yawn?)

OpenGov primarily wants administrative data and operational data. Administrative data are data gathered as a result of governments administering programs or overseeing regulations – the 401(k) data used by BrightScope noted above; EPA data generated as a result of environmental regulations, etc. Operational data are records generated as a result of government going about its own business – the visitor’s log to the White House at the federal level; 311 calls at the local level. It’s not statistical data — data that surveyors and researchers collect through observation and experimentation.

So while the Census’ Bureau’s budget is set to be slashed by an amount that means the end of the quinquennial Economic Census, (, Clive Thompson is telling me that “members of the Obama administration intervened” so that BrightScope could have (government) administrative data. So while I won’t be able to use statistical data to help local workforce agencies better target job training more effectively by understanding growing and declining economic sectors, private sector businesses can be built on administrative and operational data.

I know it isn’t this stark. I know from my non-profit sector vantage point the loss of statistical data will shift me to other activities perhaps. I’ll be able to use 311 data to help target food pantry resources, for example, while a private-sector food delivery service might also use the same data to beef up deliveries to seniors. And I’m all for increased private sector economic activity.

But what is true is that I (and others) have been working with nonprofit organizations for decades to wedge out better data from governments, and the federal statistical system is currently under siege; yet the OpenGov movement seem to be flourishing with evident private sector support (e.g. Google and Yahoo’s sponsorship of the Open Government Working Group Meeting in Sebastopol, CA in 2007which resulted in the “8 Principles of Open Government Data”). And that makes me surly.

So yes to Clive. I’ll wait for “pushy start ups [to] pressure governments to release more info [so that] activists will get to use it too.” But in return, I ask that pushy starts ups understand that there are those who have gone before, and who are standing in line.

Data, Data, Everywhere but Nary a Byte to Eat

Originally posted on July 22, 2011 by VL Carlson

In 1985 I started my “data scientist” career as the head of the DataBank at the Center for Urban Economic Development at UIC-Chicago. Ahh yes, these were the days when universities were investing in mainframe computers, Home Mortgage Disclosure Act data were becoming available, and the phrase “data-driven decision making” was entering our lexicon. Heady days. My first action item was to visit local community and public-sector agencies to find out what data they wanted and needed to run their programs. The primary ask was for city- or neighborhood-level data on health, employment and housing–could the DataBank somehow find these data?

I’m now head of the Metro Chicago Information Center and am still a part of the civic data movement, now in the era of Big Data and Open Government. MCIC is running the Apps Competition for Metro Chicago, Illinois ( – an unprecedented government partnership with conscious outreach on needs for and connections between developers and community organizations. By design, MCIC is collecting data desires from coders and community groups – what do you wish you had?

The funny thing is that these folks want the same stuff the civic groups wanted in 1985. As I look at the data wish list compiled by our outreach folks at MCIC, the similarities to 1985 are striking: neighborhood housing data, local health indicators, more data on city businesses. The difference is that now there is a false belief that these data exist somewhere, if only they could be delivered to potential users in a structured and consumable manner. Unlock the data!

But are there really more civic data to be easily accessed in the Big Data era? To a certain extent, yes—technology opens new data possibilities, but I believe we’ve also been lulled us into a false sense of abundance. The reality is that timely data are not as available as one might think; the amount of data varies widely by subject, by governmental source and by geography; and that most data are not easily mash-able/app-able for quick digestion. Data emerge through a complex socio-operational context of which technological change is only a part. Most important, the operational needs of cities lie at the core of any determination of civic data availability. City government collect data that help them operate cities.

Think about it. Data on housing vacancies don’t exist from city governments because homeowners have to file a certificate of “occupancy” but not a certificate of “non-occupancy.” Public health incident data aren’t available because there are a myriad number of health facilities run by nonprofits, federal agencies, and city health departments and they very rarely share data. We don’t know characteristics of businesses in cities because local governments generally don’t collect the kind of information we want. Yes, they do inspections and licensing as part of operations, but that doesn’t give us number of employees, or sales history, or lines of business. The (federal) Bureau of Labor Statistics DOES collect this information, but does NOT publish economic data for cities—maybe that’s what we should be advocating for.

Coupling the energy of amazing civic coders with the multifaceted knowledge of data geeks is the best way to bring about real data liberation.

Open Government Data – Measuring Urban America

Home Depot made national news when it opened its Mills Basin Brooklyn store in 2002. Its urban format partially reflected a new realization that, for city neighborhoods, urban income densities were a better measure of potential retail demand than the over-used measure of median income. The argument for measuring urban areas differently began in the late 1990s with research done by outfits such as MetroEdge, a subsidiary of ShoreBank Advisory Services in Chicago; and the national not-for-profit initiative Social Compact. I kicked off the Urban Markets Initiative (UMI) at Brookings with Pari Sabety by noting the importance to building healthy communities of increasing the quality and reliability of information for urban and inner city areas: Using Information to Drive Change: New Ways to Move Urban Markets (2004). There we point out that the relative homogeneity of rural and suburban areas makes them easier to measure than diverse urban landscapes. Cities tend to be “under-measured.”

Why? Characteristics of physical form contribute to the richness and hard-to-measure nature of cities. Housing stock is more irregular – boarding houses and smaller aged apartment buildings may have no individual mailboxes so that official address lists will miss occupants; garages with “in-law” apartment additions are often off-the-record. Mixed living spaces miss people or economic activity – live/work units (common ways by which old warehouses are re-purposed) are both a home and a business; home-based retail businesses are more common.

Our task at UMI was to address this “urban data shortage” problem primarily by focusing on making the case for better data from federal statistical agencies (although we had local initiatives as well). Why can’t economic data be made available at the city level, for example? What about a retail census that broke out cities from suburbs? Although we found our work challenging then, the situation for federal data recently looks even worse. Recent reports that data transparency initiatives at the federal level are to be severely curtailed are coupled with an attack on long-standing federal statistical initiatives (such as the American Community Survey) that produce critical economic and demographic data.

Here is the historical opportunity for the open government movement. Ten years ago open gov was in its infancy—Malamud’s “8 principles of Open Government Data” wasn’t published until 2007—now open data catalogs are appearing in cities across the US. What can’t be measured at the federal level, either because of a lack of political will or because of the inflexible nature of the federal statistical system, may be able to be found from data collected locally.

This is my hope for the Apps4MetroChicago competition. Not only the opportunity for fabulous apps, but apps that reveal the rich and diverse nature of the urban landscape. Measures of the local food environment that incorporate permits, licensing and inspections data as a way of tracking retail locations that otherwise would slip through the cracks. Put together building permit and occupancy data in a way that might be a leading indicator of economic activity. Show us hospital records in order to estimate the health care uninsured. Use 311 and crime data to describe neighborhoods – trendy, industrial, entertainment district, families.

In short, use government data to help us reveal the rich nature of urban areas.

About VL Carlson

Virginia Carlson is a data geek who believes that the right systems -physical and social – create optimal outcomes. She’s been a professor, a researcher, a storyteller, a photographer and a architectural historian.