Charity Leads: Robin Beveridge
Data Ambassadors: Eric Hannell, Abhay Bagai, Andy Lulham
People, twitter and GitHub IDs:
Robin Beveridge nechildpoverty
Andy Lulham twitter: andylolz github: andylolz
Michael O’Kane github: miokane
Pete Owlett github: peteowlett
Annabel Church, @annabelchurch, arc64
Tom Russell, github: tomalrussell
Kostas Kokkas KokkasKostas
Wayne Holt twitter:wayneholt email:email@example.com
Billy Wong @BillyWong_HnF, firstname.lastname@example.org
About the North East Child Poverty Commission:
The North East Child Poverty Commission aims to raise awareness and prompt action on Child Poverty in the North East. It is a small, unconsituted body at the moment, in the process of registering as a charity (when its one part time worker isn’t attending Datadives...).
If anyone wants to keep informed about what we are up to, we (I) produce a monthly-ish e-newsletter - sign up here: http://eepurl.com/RApgn
We have lots of data to show the extent of child poerty in the North East - see www.nechildpoverty.org.uk/data. BUT:
- much of it is not very local
- most of it is very OLD by the time it is released
And it can be quite hard for mere mortals to access, use and act upon.
What we’re going to do to solve it:
•Has 2 projects we want help on:
1. Developing an interactive online data tool for child poverty indicators
2. Establishing a link between CAB data and Child Poverty to act as a current, local proxy indicator
Project 1: Our current site has a very static presentation of data in graphs as images plus embedded tables: http://www.nechildpoverty.org.uk/local-child-poverty-incidators. Would like to create something that enables users to select their choice of geography and variable, something like this:
Project 2: CAB has lots of up-to-date, local data about families in need (especially debt. If we can identify a link between CAB data and official child poverty data in the past, we can use current CAB data as a proxy indicator for current levels of child poverty. This is relevant to the whole country, though we’d be doing it for the North East first.
Google Fusion links:
Project: Dashboard visualisation
Fork of DC action for kids:
https://github.cLinks of interestom/DataKind-UK/child-poverty-commission-dashboard
The live link:
How can we improve what’s currently there?
Summary of already existing visualizations team:
We will take the existing visualizations and improve upon them (try different ways to visualize the existing data, etc.)
DC Action for Kids
Oopsie! We used the wrong projection - use WGS84 and then remove the crf field from the GeoJSON file to get it to render correctly.
To create the geojson file we:
To create the csv data file we:
To plumb these two files into the app we changed visualisation.js from this:
To merge another column into the csv data file we:
To add the new data set into the app we need to add a new entry in the left-side pop out menu. Go and find the bit that looks like this:
And add a new line that looks like this:
Note that the column name in the CSV file needs to start with one of the following to select it’s colour:
The data sources were assembled into a single Excel file and stored in the repo at data/lsoa_data.xls. The spreadsheet has a tab at the front called "lsoa_data.csv" - we exported this as CSV into the same directory, where it’s automatically picked up by the app.
The data in the spreadsheet was constructed from the following sources:
Project: Predicting the prevalence of child poverty within a LSOA
Given CAB data and HMRC child poverty build a prediction model that uses CAB data to predict HMRC child poverty data in advance of the data officially becomes available.
Links of interest
Population and population by age per North East LSOA in 2011
Local Authority Boundary GeoJSON: https://www.dropbox.com/s/30gqqp1xg9fwc9e/North%20East%20LAs.geojson?dl=0 (approx 2MB)
Transposed CAB Data:
CAB NE_Datadive_2011_Jan_to_Dec_BEN_DEB_LSOA dataset transposed by the text categories - where the value field is count of the original Clients column:
Full set of input data as well as R-script for clean-up and merge task:
Final data set used for learning algorithm/model:
Predictions for gateshead:
fullview-2014-feb-06.train.csv - contains all LAs except for Gateshead, was used to train an M5P regression tree
fullview-2014-feb-06.test.actual.csv - true values for Gateshead
fullview-2014-feb-06.test.actual.withcodes.csv - true values for Gateshead (with area codes)
fullview-2014-feb-06.test.predictions.csv - predictions for Gateshead
fullview-2014-feb-06.test.predictions.withcodes.csv - predictions for Gateshead (with area codes)
predictions.csv - area codes + true values for Gateshead + predicted values for Gateshead
LSOA to Local Authority mapping:
Rural-Urban classification for LSOAs:
Merged data for different period per LSOA’s
NE LA Shapefile: https://www.dropbox.com/sh/x8kegly2do0sudy/AABT_sWhJ2Q89PbDHX_q5JXSa?dl=0
Final addition from me. This took an inordinate amount of time, due to the backward design of the DfE’s website.
Once again ripping off Inspired by the DC app’s basic overlay of schools information, I started putting together a JSON document describing all the schools in the region, through much abuse of awk/sed/grep/curl and python.
Find it here https://www.dropbox.com/s/6l3l4q6gp2w1s84/schools.json?dl=0
You can find what the field names mean here http://education.gov.uk/schools/performance/metadata.html
There’s a wealth of stuff in there; absenteeism, school meals, absolute performance, performance vs expected performance, "value add", financials, age of the school etc. It also has some history in certain fields (mainly the financials). The main caveat at the moment is that it’s a single FOGB JSON file, so it’ll be tricky to crunch through. I tried converting it into a relational or denormalised tabular form but there are over 500 fields in play on the wider documents. ElasticSearch could be used in lieu of an RDBMS.
It’s also not quite yet perfect for geo-visualisation use; the only location data is a postcode. This should be readily converted to lat/lon, but I’ll leave that until the morning.
One final note - the schools JSON for the north east alone is 12MB; were this app to be turned into a reproducible toolkit it’d need to be re-engineered with a real data layer, maybe Mongo, or whatever the hipsters approve of that month.
Exporting Shapefile polygons to CSV (for Tableau)
Predicting child poverty using tax credit data
Children in poverty can be divided into two groups
Instead of predicting the percentage of children in poverty, I scaled up the problem to the number of children in poverty using local demographics data (http://neighbourhood.statistics.gov.uk) in each LSOA. Then I apply a simple linear model
a1, a2 are coefficients of the linear regression model.
The advantages of this model are that
The remaining task is to rerun the same model using data from different years to check if a1,a2,b are stable over the years. If they are then we can safely use a model learned from 2011 data, plug in 2014 tax credit data, and predict 2014 child poverty
I chose not to use the CAB data to predict child poverty because I suspect (with no evidence!) that most people facing child poverty do not seek help from CAB, and the propensity to seek help is likely to be highly variable from location to location (e.g. the % of new immigrants who are not aware CAB, the prominence of the CAB office in the local high street etc).
Feedback and contact details
Feedback Contact Details