More Information Resources:
Congressional Record Search
Status: Not yet started
First described: March 18, 2015, here
Last updated: November 2016
Description: The Congressional Record is an official source of information about congressional actions, and includes speeches made on the House and Senate floor. Unfortunately, the federal government’s search tool for the CR is cumbersome and yields many unhelpful results.
Build upon an open source tool that scrapes speeches from the Congressional Record to allow anyone to intelligently search congressional speeches. A user can focus on a particular member and look for a word or phrase.
This empowers staff, activists, and journalists to quickly locate member statements on issues of public importance.
Status: Largely Completed (but no work currently being done), available here
First described: June 2013 (here)
Description: A customizable daily or weekly email (and also single webpage) with all congressional hearings and floor votes, including committees and subcommittees, which at a click of a button would be added to your calendar.
It’s just about impossible to follow when Congress has scheduled a committee/subcommittee hearing/meeting without a paid subscription to a news service that gathers this info. But over the last few years, the Senate and House have begun releasing meeting notices online in parsable formats. Unfortunately, there’s no publicly-available central place to see all the notices from the different committees, and it’s not possible to sign up for official alerts for a particular subcommittee. All the data is there, but it isn’t being corralled. The page should contain basic meeting information in one place, with links out to the committee pages if you want to find more information (like witnesses, testimony for the record, etc.)
In addition to having a compilation of all hearings/markups, for many people, it may be useful to follow a few particular subcommittees, but information about actions by others are distracting. For example, I pay attention to the Legislative Branch Appropriations Subcommittee, but don’t really care much about the other Appropriations Subcommittees. There should be a way to filter out the noise and just subscribe to particular subcommittees.
Also, it’d be great to see what’s scheduled for floor votes during a particular week. Pull data in from the House Whip notices and Senate Majority/Minority leader colloquoy.
GovTrack’s implemention of this idea is impressive. A few additional touches can round out the project. They include:
CRS Report Freshness Ratings
First described: June 2013, here
Updated: November 2016
Available at EveryCRSReport.com
Description: The Congressional Research Service is a congressional think tank, and it issues report on important issues of the day. Over time, CRS will update a report to reflect new facts or changing circumstances. Sometimes these changes are significant, but other times the update could be as minor as the addition of punctuation or removal of a citation. However, there’s no way for the reader to know whether the new report needs to be read closely or if there’s just been a cosmetic change.
CRS reports should have freshness ratings based on a comparison of the current text to the previous iteration. So, if the language is virtually identical except for the addition of a sentence, it would receive a low rating (e.g. 1% fresh), but if the report has been largely rewritten, it would receive a high rating (e.g. 80% fresh).
All CRS reports have a unique identifier on their front page as well as the date it was issued. For example, a report could have unique ID RL1234 and have an issued day of May 1, 2012. If it is reissued, the unique ID stays the same, but it gets a new issued date of September 1, 2013. Alas, the reports are in PDF format, so it’s probably a non-trivial problem to show what text has changed. But using PDF-to-text, you can at least compare the output files to see whether there’s a trivial or significant difference.
Constituent CRS Report Request Tool
Status: not started
First described: April 2015, on this webpage
Description: Build a tool that allows congressional offices to receive requests to publish CRS reports and to host the publication of those reports. As an additional bonus, allow member office to publish a list of all reports that a constituent/the public could request.
CRS Report Aggregator Website
Status: Largely completed
First described: many times, in March 2015 here
Updated: November 2016
Available at: EveryCRSReport.com
Description: The Congressional Research Service is an internal think tank for Congress with a budget of $107 million. A part of the Library of Congress, CRS is a legislative branch agency, which renders it exempt from FOIA. Its research, produced on a non-partisan basis entirely at taxpayer expense, has significant influence on Congress’ work. CRS reports address virtually every public policy topic available, and are recognized as authoritative and without ideological bias. Indeed, the agency’s reports cover topics that are rarely studied (postal workers’ pension funding) as well as those subject to intense ideological debate (federal education policy). As such, our democracy would benefit by making the reports freely available to the public.
Unlike its sister agencies (GAO and CBO), CRS does not directly release its reports to the public. It instead makes them available to Congress and, upon request, to other branches of government. While many reports eventually become available to the public, they often must be purchased from third-party vendors and the most recent version is not always available.
Aggregate CRS reports from the wild and request them from congressional offices.
Could use the code behind Legisletters to gather CRS Reports.
Fix Pre-Introduction Legislation
Status: In progress by the OpenGov Foundation. See website, GitHub
First Described: June 2013, here
Working draft: October 2015, here (built by Quorum)
Description: It’s important to be able to have plain text versions of bills, especially draft legislation. Why? Clean (non-PDF) versions can be compared against other iterations to see what has changed and marked-up so that you can easily make suggestions for improvements. Unfortunately, pre-introduction legislation is only made available to staff as a PDF, which is hardly useful to anyone. And sometimes even introduced legislation is available first as a PDF and only later as XML.
What would be helpful is a tool that ingests PDFs of draft-legislation and returns plain text. But converting the PDF to text isn’t enough. It also would need to remove the line numbers, the headers (e.g. “F:\M13\ROYCE\ROYCE_005.XML” as well as the page numbers), and the footers (e.g. “F:\M13\ROYCE\ROYCE_005.XML f:\VHLC\022613\022613.176.xml (542138|23)”. By clearing out this additional stuff, you’re left with the text of the legislation only, which can then be used in many ways.
Front Page for Law
Status: Not started
First Described: March 2015, here
Description: Thanks to the work of our coalition (and especially Joe Carmel), there now is a website where it is possible to look up every public law from the beginning of the country, called Legislink. However, it needs help to become more user friendly.
LegisLink needs a simply Google-like front-end search bar that allows users to type in a federal law by citation and link to the results. As federal laws can be identified by several citation formats, this will require identifying and parsing the various citation formats and connecting users to the results, hiding all of the complexity.
The ability to read the law is essential to self-governance. Proposed laws often modify older provisions. Anyone who wishes to engage in the federal law-making process at a significant level of detail will sooner or later need to make use of this resource.
Congressional Correspondence Tracker
Status: Prototype: Legisletters, in progress
First described: Many times, but in May 2015 on this webpage. Description of work in progress here.
Description: Build a Congressional Correspondence Tracker that allows Member offices to track all communications to and from agencies, automatically publish and thread letters and responses to the public (with redactions as appropriate)
Congressional Record as Data
Status: Partially started, but no work is being pursued.
First described: Many times
Description: The Congressional Record is the central, unique source of nearly all things that take place in Congress. From speeches and votes to bill text and scheduling, the Record is a source of tremendous amounts of information. But it is published as PDFs and unstructured data, making it hard to use and analyse.
The Sunlight Foundation has built tools that pull out speeches, and internal congressional offices pull out other information (like judicial nominations), but the whole record should be transformed into structured data. Unfortunately, that means building parsers.
Constitution Annotated As Data
Status: Not started
First described: September 2009 here, but Cornell’s Legal Information Institute built a working version for 2002 here
Description: The Constitution Annotated is a handy encyclopedia that explains the U.S. Constitution as it has been interpreted by the U.S. Supreme Court. After a lot of nudging, is is published online by GPO and the Library of Congress PDF, updated annually (more or less). (The document is actually prepared as an XML file and updated regularly for congressional staff).
The Constitution Annotated should be published in a structured-data format so the public can easily reuse the information so that more people can benefit from the knowledge it contains. Structured data makes it easier to embed the information in Wikipedia, or create better websites on the Constitution, and so on. It also means we can do neat things with the contents, such as automatically classifying Supreme Court cases by topic simply by drawing upon the document’s structure.
Congressional Expenditures As Structured Data
Status: Started, but not continued
First described: December 2008, here
Updated: November 2016
House Expenditure Reports are now being published (going forward) as structured data
Description: Congressional spending information should be published as electronic spreadsheets. Spending reports contain vital information about lawmakers’ budgets. Greater usability of these reports enhances oversight of Congress. The quarterly House statement of disbursements and semi-annual Senate expenditure reports should be transformed from online PDFs into structured data. The Sunlight Foundation already is scraping staff lists and pay, but there’s much more data.
Map/App for Congressional Office Locations
Status: Not started
First described: here, June 2015
Description: Many tens of thousands of people visit Congress each year, including visiting their own member of Congress. Unfortunately, it’s a bit confusing to find the right building, and then to find the member office in each building. (The layout is not intuitive) Moreover, if a visitor is going from one office to the next, it’s useful to have help finding the most direct route.
There should be an app that you can type in your member office address and it will guide you there, making proper use of the elevators. Also included should be locations of bathrooms, dining halls, and the gift shop.
Note the Sunlight Foundation has shut down its labs. Here is where the tools and data sources have gone: http://sunlightfoundation.com/2016/11/01/sunlight-labs-update-nonprofits-step-up-to-preserve-tools-for-transparency/