Google Refine extension
Google Refine uses a modular web framework called "Butterfly" to allow custom extensions to be built and make use of Refine's core functionality.
The LinkedGov extension modifies Refine's "index" page - Extension/index_page - mainly to include a metadata form and it's "project" page - Extension/project_page - mainly to include the "Typing" panel - Extension/typing panel.
The LinkedGov UI skin for Google Refine should exist as an extension in the /extensions folder.
The extension also relies on the RDF extension.
There are only two pages to work with in Google Refine.
The index page is the landing page once the Refine servlet has started - from here you are able to create a project, import a project or open an existing project.
The project page is where the data manipulation is carried out - much like a worksheet in spreadsheet software.
LinkedGov adds modifications to both pages.
Shows a particular panel depending on the "mode" parameter that's passed to the page in the URL from the menu page.
Home to the "import" screen and "resume" screen - used to begin a "project" (Refine's terminology).
The "project" page in Refine is the page that is home to the data table, allowing data manipulation, transformation and so on.
The LinkedGov extension adds a number of additional "panels" on the left-hand-side - that allow the user to clean, link and label data.
- Download and install Google Refine (version 2.5 or higher): http://code.google.com/p/google-refine/source/checkout.
- Download the LinkedGov extension from GitHub: https://github.com/linkedgov/linkedgov-google-refine-extension.
- Download the RDF extension by DERI: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/.
- Extract both of the extensions into their own folders named "linkedgov" and "rdf-extension", and place them inside Google Refine's "extensions" directory:
- Add 4 lines of 'ant' code (2 'build' lines and 2 'clean' lines) to the "build.xml" file that's found in the extensions folder:
<ant dir="rdf-extension/" target="build" /> <ant dir="linkedgov/" target="build" />
<ant dir="rdf-extension/" target="clean" /> <ant dir="linkedgov/" target="clean" />
- Rebuild Refine by typing in the terminal command while inside the main Refine directory, "ant" (you should see a "Build Successful" message).
- Run Google Refine by typing in the terminal command "./refine" (or the equivalent for a particular operating system).
See Extension/modification regarding changes to Refine's default behaviour.
Styling the pages is fairly straightforward. The index.js and project.js files both add the class "lg" to the <body> element on each page. Each CSS file then styles any Refine elements or LinkedGov elements using "body.lg" as a prefix.
There's a mixture of CSS and LESS files for styles.
Across the Importer is a feedback form.
See Extension/feedback_form for more information.
RDF (Resource Description Framework) data is produced behind the scenes when interacting with the wizards and cleaning data. It's produced using functionality from the RDF Extension built by DERI.
Examples of the data & structure generated:
- RDF/Refine_output Output for a row in Refine
- RDF/Refine_metadata_output Output for dataset metadata
- RDF/Refine_dataset_output Example output of the whole dataset
A list of good and bad example datasets (locations, contents) have been compiled here: Extension/Example_datasets
- Personal data (telephone numbers, house addresses)
- Geographic coordinates other than WGS84, negative northings/eastings?
There are issues with some types of data when importing.
Browser Compatibility Issues
Our reported bugs
- Issue 439: Date values output as java.util.GregorianCalendar(...,...,...,...) instead of "2011-20-03T00:00:00Z"
- Issue 441: onError - "keep-original" / "store-blank" working oddly for value.toDate()
- Issue 442: Two column transforms to date on the same column turns the cells blank
- Issue 490: Unable to guess filetype is CSV when importing a simple CSV file
- Issue 491: Importing Excel files - blank columns and time formatting
- Issue 492: Table header cells misaligning with table cells
- Issue 493: Columnize by key/value leaving "null" columns in Refine's rowModel
- Issue 511: Refine automatically attempts to parse a string containing "E" as e notation
- Issue 545: "Clear reconciliation data" not clearing all reconciliation data
- Issue 548: escape(value, encoding) returns "null" for non-string values