DataKind UK Autumn DataDive 2014

North East Child Poverty Commission

Charity Leads: Robin Beveridge

Data Ambassadors: Eric Hannell, Abhay Bagai, Andy Lulham

People, twitter and GitHub IDs:

Robin Beveridge nechildpoverty

Andy Lulham twitter: andylolz github: andylolz

Michael O’Kane github: miokane

Pete Owlett github: peteowlett

Annabel Church, @annabelchurch, arc64

Tom Russell, github: tomalrussell

Michael Leonard

Kostas Kokkas KokkasKostas 

Wayne Holt twitter:wayneholt email:wayneholt@gmail.com

Billy Wong @BillyWong_HnF, billy.wong@hedgehogandfox.co.uk

About the North East Child Poverty Commission:

The North East Child Poverty Commission aims to raise awareness and prompt action on Child Poverty in the North East.  It is a small, unconsituted body at the moment, in the process of registering as a charity (when its one part time worker isn’t attending Datadives...).

If anyone wants to keep informed about what we are up to, we (I) produce a monthly-ish e-newsletter - sign up here: http://eepurl.com/RApgn

The Problem:

We have lots of data to show the extent of child poerty in the North East - see www.nechildpoverty.org.uk/data.  BUT:

 - much of it is not very local

 - most of it is very OLD by the time it is released

 e.g. 

And it can be quite hard for mere mortals to access, use and act upon.

What we’re going to do to solve it:

•Has 2 projects we want help on:

1. Developing an interactive online data tool for child poverty indicators

2. Establishing a link between CAB data and Child Poverty to act as a current, local proxy indicator

The Data: 

Project Suggestions:

Project 1: Our current site has a very static presentation of data in graphs as images plus embedded tables: http://www.nechildpoverty.org.uk/local-child-poverty-incidators.  Would like to create something that enables users to select their choice of geography and variable, something like this: 

http://atlas.chimat.org.uk/IAS/dataviews/report/fullpage?viewId=439&reportId=489&geoId=4&geoReportId=4238

http://www.phoutcomes.info/public-health-outcomes-framework#gid/1000049/par/E12000004

Project 2: CAB has lots of up-to-date, local data about families in need (especially debt.  If we can identify a link between CAB data and official child poverty data in the past, we can use current CAB data as a proxy indicator for current levels of child poverty.  This is relevant to the whole country, though we’d be doing it for the North East first.

Google Fusion links:

https://www.google.com/fusiontables/data?docid=13cjnA7TmQnops4lg27PJ1JJfyc06g7HVdlypa0K9#rows:id=1

Project: Dashboard visualisation

Fork of DC action for kids:

https://github.cLinks of interestom/DataKind-UK/child-poverty-commission-dashboard

The live link:

http://bit.ly/datakind-children

Current storytellinge

How can we improve what’s currently there?

Audience

Summary of already existing visualizations team:

We will take the existing visualizations and improve upon them (try different ways to visualize the existing data, etc.)

DC Action for Kids

  • Screenshots

  • Links
  • http://dcist.com/2014/05/four_maps_that_show_how_children_li.php

    http://www.datakind.org/mapping-poverty-to-beat-it/

    https://storify.com/mayurhpatel/data-passion-awesomeness-1?awesm=sfy.co_eO0

    http://datatools.dcactionforchildren.org/

  • Reverse engineering Progress
  • Oopsie! We used the wrong projection - use WGS84 and then remove the crf field from the GeoJSON file to get it to render correctly.

  • Creating the data files
  • To create the geojson file we:

    To create the csv data file we:

    To plumb these two files into the app we changed visualisation.js from this:

    to this:

    To merge another column into the csv data file we:

    To add the new data set into the app we need to add a new entry in the left-side pop out menu. Go and find the bit that looks like this:

    And add a new line that looks like this:

    Note that the column name in the CSV file needs to start with one of the following to select it’s colour:

  • Data sources in DC App
  • The data sources were assembled into a single Excel file and stored in the repo at data/lsoa_data.xls. The spreadsheet has a tab at the front called "lsoa_data.csv" - we exported this as CSV into the same directory, where it’s automatically picked up by the app.

    The data in the spreadsheet was constructed from the following sources: 

    Project: Predicting the  prevalence of child poverty within a LSOA

  • Project Objectives
  • Given CAB data and HMRC child poverty build a prediction model that uses CAB data to predict  HMRC child poverty data in advance of the data officially becomes available.  

     

  • Notes on data set:
  • Links of interest

    http://elearning.citizensadvice.org.uk/management-information/Interactive%20Maps/2013-14/Q1/County%20Durham/atlas.html

    Population and population by age per North East LSOA in 2011

    https://www.dropbox.com/s/ivgbqvesw0q96qc/North%20East%20LSOA%20population%20https://www.dropbox.com/s/ivgbqvesw0q96qc/North%20East%20LSOA%20population%20by%20age.tsv?dl=0by%20age.tsv?dl=0

    source: http://www.ons.gov.uk/ons/rel/sape/soa-mid-year-pop-est-engl-wales-exp/mid-2011--census-based-/stb---super-output-area---mid-2011.html

    Local Authority Boundary GeoJSON: https://www.dropbox.com/s/30gqqp1xg9fwc9e/North%20East%20LAs.geojson?dl=0 (approx 2MB)

    Transposed CAB Data:

  • Micheal: 
  • CAB NE_Datadive_2011_Jan_to_Dec_BEN_DEB_LSOA dataset transposed by the text categories - where the value field is count of the original Clients column:

    https://www.dropbox.com/sh/cnf9iky80vq2a0w/AADeYPSn_HqNjCAB2011_AllThreeLevelsTransposed.csvxBPkRpkLSla?dl=0

  • Erich: 
  • Full set of input data as well as R-script for clean-up and merge task:

    https://www.dropbox.com/sh/l14xsy91kgvfg7p/AAB6ba-3uvqQn7Nu8MZ74bEpa?dl=0

    Final data set used for learning algorithm/model: 

    fullview.csv

    Jacek:

    Predictions for gateshead:

    https://www.dropbox.com/sh/291c9t8wpqlz64k/AACZKqWpfWLN8dJaFz7qLUqCa?dl=0

    fullview-2014-feb-06.train.csv - contains all LAs except for Gateshead, was used to train an M5P regression tree

    fullview-2014-feb-06.test.actual.csv - true values for Gateshead

    fullview-2014-feb-06.test.actual.withcodes.csv - true values for Gateshead (with area codes)

    fullview-2014-feb-06.test.predictions.csv - predictions for Gateshead

    fullview-2014-feb-06.test.predictions.withcodes.csv - predictions for Gateshead (with area codes)

    predictions.csv - area codes + true values for Gateshead + predicted values for Gateshead

    LSOA to Local Authority mapping:

    https://www.dropbox.com/s/0e993af4vrs2k9j/LSOA%20to%20Local%20Authority%20Mapping.csv?dl=0

    Source:

    https://geoportal.statistics.gov.uk/geoportal/catalog/main/home.page

    Rural-Urban classification for LSOAs:

    https://www.dropbox.com/s/1h32n47z3mvofiu/LSOA%20Rural%20Urban%20Classification.csv?dl=0

    Source:

    https://geoportal.statistics.gov.uk/geoportal/catalog/main/home.page

    Merged data for different period per LSOA’s

    Merged_Data.csv

    https://www.dropbox.com/sh/cnf9iky80vq2a0w/AADeYPSn_HqNjjxBPkRpkLSla?dl=0 

    NE LA Shapefile: https://www.dropbox.com/sh/x8kegly2do0sudy/AABT_sWhJ2Q89PbDHX_q5JXSa?dl=0

    Schools Data

    Final addition from me. This took an inordinate amount of time, due to the backward design of the DfE’s website. Once again ripping off Inspired by the DC app’s basic overlay of schools information, I started putting together a JSON document describing all the schools in the region, through much abuse of awk/sed/grep/curl and python.

    Find it here https://www.dropbox.com/s/6l3l4q6gp2w1s84/schools.json?dl=0

    You can find what the field names mean here http://education.gov.uk/schools/performance/metadata.html

    There’s a wealth of stuff in there; absenteeism, school meals, absolute performance, performance vs expected performance, "value add", financials, age of the school etc. It also has some history in certain fields (mainly the financials). The main caveat at the moment is that it’s a single FOGB JSON file, so it’ll be tricky to crunch through. I tried converting it into a relational or denormalised tabular form but there are over 500 fields in play on the wider documents. ElasticSearch could be used in lieu of an RDBMS.

    It’s also not quite yet perfect for geo-visualisation use; the only location data is a postcode. This should be readily converted to lat/lon, but I’ll leave that until the morning.

    One final note - the schools JSON for the north east alone is 12MB; were this app to be turned into a reproducible toolkit it’d need to be re-engineered with a real data layer, maybe Mongo, or whatever the hipsters approve of that month.

    Exporting Shapefile polygons to CSV (for Tableau)

    Predicting child poverty using tax credit data

    Children in poverty can be divided into two groups

    Instead of predicting the percentage of children in poverty, I scaled up the problem to the number of children in poverty using local demographics data (http://neighbourhood.statistics.gov.uk) in each LSOA. Then I apply a simple linear model

    Key:

    a1, a2  are coefficients of the linear regression model.

    The advantages of this model are that 

    The remaining task is to rerun the same model using data from different years to check if a1,a2,b are stable over the years. If they are then we can safely use a model learned from 2011 data, plug in 2014 tax credit data, and predict 2014 child poverty

    I  chose not to use the CAB data to predict child poverty because I suspect (with no evidence!) that most people facing child poverty do not seek help from CAB, and the propensity to seek help is likely to be highly variable from location to location (e.g. the % of new immigrants who are not aware CAB, the prominence of the CAB office in the local high street etc).

    Feedback and contact details

    Feedback Contact Details

    Presentation:

    http://prezi.com/yf5tpfvvbfb3/?utm_campaign=share&utm_medium=copy&rc=ex0share