UM joins effort to preserve online government data

Kim Kozlowski
The Detroit News

The University of Michigan has joined scientists, librarians and citizens across the nation to save publicly available data on U.S. government websites for fear it could disappear during the presidency of Donald Trump.

Known as the DataRefuge project, volunteers have been enlisted to gather, capture and archive data on the internet from government websites, including NASA, the National Oceanic and Atmospheric Administration, and the U.S. Department of Energy, so that information remains in the public domain even if it vanishes.

The work has just begun, organizers say. UM recently hosted hundreds at a data capture event over fears of a Trump scrub and school officials are planning more events, too, possibly in Detroit with Wayne State University. More events are planned at other schools, such as Yale University and Northeastern University and in San Francisco and Madison, Wisconsin.

With Trump calling for an elimination of the Environmental Protection Agency and threats of backing out of a global climate change agreement, some say preservation of scientific data is warranted. They cite concerns of regulation losses without scientific data being readily available as some could prioritize business over public health.

The DataRefuge project has honed in on environmental documents but will also encompass other areas the new administration has cast doubts on, such as studies on racial disparities in housing, said Justin Schell, director of the Shapiro Design Lab at the UM Library.

“This is a moment of urgency because some of these agencies have a a target on their back because of statements from this administration,” Schell said.

The EPA said in a recent statement no substantive changes have been made on its website.

“As part of EPA’s standard process and our continuous efforts to review and refresh the website, EPA career staff updated a number of web pages in January, before the change in administration,” the statement said. “For example, staff edited the International Climate Partnerships page to reduce redundant text, and updated a number of pages to remove links from pages that would be impacted by the presidential transition (the links went to the previous administration’s White House web page, which has been archived).

“These updates were routine web maintenance and in line with the agency’s web guidelines. We did not remove any substantive information about climate change science or EPA programs.”

Other federal websites remain under heavy scrutiny. On Friday, two senators questioned Education Secretary Betsy DeVos in a letter over availability issues with a website addressing the Individuals with Disabilities Education Act, which they note has been online since President George W. Bush’s time in office.

“To that end, we are deeply concerned that prior to your confirmation and arrival at the department the centralized resource website for the IDEA ( became inaccessible to the public for more than a week, and is now redirecting people to a site for the Office of Special Education Programs (OSEP). The OSEP website lacks much of the information previously available,” wrote Washington Sens. Patty Murray and Maria Cantwell, two Democrats.

“The department's failure to keep this critical resource operational makes it harder for parents, educators, and administrators to find the resources they need to implement this federal law and protect the rights of children with disabilities.”

The OSEP website, meanwhile, notes that “the servers hosting our website are experiencing technical issues. As we work to resolve this issue, information regarding the Individuals with Disabilities Education Act can be found below.”

Michelle Murphy, a professor and director of the Technoscience Research Unit at the University of Toronto, is worried about the possibility of data disappearance and thinks citizens should be, too, with so much at stake, especially in regard to environmental data protecting issues, such as drinking water and the Great Lakes, which make up 84 percent of fresh water in North America.

Murphy was noticing some of the similarities of the Trump administration’s attitudes toward science with those under former Canadian Prime Minister Stephen Harper. During Harper’s tenure, Murphy said, Canadian federal scientists were limited in talking with academics, the public and the press.

After Trump talked of ending the EPA, and appointed a transition team that included Myron Ebell, a climate denier, Murphy became more concerned. She organized the first archiving event in December at the University of Toronto with the Environmental Data and Governance Initiative (EDGI), a network of academics, librarians and nonprofits that promotes evidence-based science policies and public access to data and information.

“We saw here in Canada the things that could happen in the U.S., particularly to environmental science,” Murphy said. “The difference is Prime Minister Harper did not celebrate what he was doing. It was more secretive. In the U.S., there was a gleeful celebration to dismantle environmental regulations and to discredit environmental science. That was a sign what could happen in the U.S. could be much more severe because it’s not being hidden.”

Archivists have been harvesting documents under U.S. presidents on federal websites since 2008, near the end of President George W. Bush’s administration. It began after National Archives officials put the responsibility of archiving government data on websites onto federal agencies. There was concern among archivists that not all data would get preserved, said Jefferson Bailey, director of web archive at Internet Archive, a San Francisico-based, nonprofit digital library that preserves cultural heritage in digital formats. The UM group’s efforts are contributing to that archive.

That’s why the Internet Archive began partnering with the Library of Congress, a few universities and libraries to create the End of Term Web Archive to preserve all documents under a presidential administration.

The project begins with a compilation of every registered government website. Internet bots are dispatched to websites and download the website and every resource required to play it, such as JavaScript, and discover every link and then crawl those, Bailey said.

It starts two to three months before a sitting president leaves office and continues two to three months after his departure. When former President Barack Obama left office on Jan. 20, Bailey estimated about 75 percent of the data on government websites had been collected.

But Bailey acknowledged a crawler is not getting 100 percent of everything that exists, and there is a growing volume of data every year with more social media.

It’s good to have so much interest in archiving governmental websites, Bailey added. Nominations for pages to archive have exploded in recent years, from 1,000 in 2012 to 12,000 this year.

“There’s the obvious reason — Trump — and people are concerned that he’ll delete data from the web that is different from their political viewpoint,” Bailey said. “But I also think people are more aware of the web as the primary communication platform of our era. Because of that, people are more aware of its historical importance, and how it changes, and how data disappears.”

Locally, those who have participated in the Ann Arbor Data Rescue, which has culled nearly 20,000 URLs for the Internet Archive, have been people who are interested in protecting environmental data, building community or data transparency.

Among those who want to get involved is Pat Smolarski, who said she is upset about potential damage that could be done to the country without scientific data to shape policy.

“All they are thinking about is money,” said Smolarski, a nuclear medicine technologist at UM’s hospital, about decisions by the Trump administration in the first three weeks in office. “It’s as if they don’t care about future generations.”