Google Refine extension
From LinkedGov
Google Refine uses a modular web framework called "Butterfly" to allow custom extensions to be built and make use of Refine's core functionality.
Extensions tend to be a mixture of front-end code (JavaScript and UI modifications) and back-end code (Java commands & servlets), and exist as a folder in the "extensions" folder in Refine's root directory.
The LinkedGov extension modifies Refine's "index" page - Extension/index_page - mainly to include a metadata form and it's "project" page - Extension/project_page - mainly to include the "Typing" panel - Extension/typing panel.
Contents |
Overview
https://github.com/linkedgov/linkedgov-google-refine-extension
The LinkedGov UI skin for Google Refine should exist as an extension in the /extensions folder.
The extension also relies on the RDF extension.
Folder structure
See Extension/folder structure
Code structure
Pages
There are only two pages to work with in Google Refine.
The index page is the landing page once the Refine servlet has started - from here you are able to create a project, import a project or open an existing project.
The project page is where the data manipulation is carried out - much like a worksheet in spreadsheet software.
LinkedGov adds modifications to both pages.
Index page
Shows a particular panel depending on the "mode" parameter that's passed to the page in the URL from the menu page.
Home to the "import" screen and "resume" screen - used to begin a "project" (Refine's terminology).
Project page
The "project" page in Refine is the page that is home to the data table, allowing data manipulation, transformation and so on.
The LinkedGov extension adds a number of additional "panels" on the left-hand-side - that allow the user to clean, link and label data.
Installation
- Download and install Google Refine (version 2.5 or higher): http://code.google.com/p/google-refine/source/checkout.
- Download the LinkedGov extension from GitHub: https://github.com/linkedgov/linkedgov-google-refine-extension.
- Download the RDF extension by DERI: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/.
- Extract both of the extensions into their own folders named "linkedgov" and "rdf-extension", and place them inside Google Refine's "extensions" directory:
- Add 4 lines of 'ant' code (2 'build' lines and 2 'clean' lines) to the "build.xml" file that's found in the extensions folder:
<ant dir="rdf-extension/" target="build" /> <ant dir="linkedgov/" target="build" />
and
<ant dir="rdf-extension/" target="clean" /> <ant dir="linkedgov/" target="clean" />
- Rebuild Refine by typing in the terminal command while inside the main Refine directory, "ant" (you should see a "Build Successful" message).
- Run Google Refine by typing in the terminal command "./refine" (or the equivalent for a particular operating system).
Modifying Refine
See Extension/modification regarding changes to Refine's default behaviour.
Styling
Styling the pages is fairly straightforward. The index.js and project.js files both add the class "lg" to the <body> element on each page. Each CSS file then styles any Refine elements or LinkedGov elements using "body.lg" as a prefix.
There's a mixture of CSS and LESS files for styles.
Dialogs
See Extension/Dialogs.
Feedback form
Across the Importer is a feedback form.
See Extension/feedback_form for more information.
RDF output
See Extension/RDF & Extension/RDF Schema.
RDF (Resource Description Framework) data is produced behind the scenes when interacting with the wizards and cleaning data. It's produced using functionality from the RDF Extension built by DERI.
Examples of the data & structure generated:
- RDF/Refine_output Output for a row in Refine
- RDF/Refine_metadata_output Output for dataset metadata
- RDF/Refine_dataset_output Example output of the whole dataset
Example datasets
A list of good and bad example datasets (locations, contents) have been compiled here: Extension/Example_datasets
Unacceptable data
- Personal data (telephone numbers, house addresses)
- Geographic coordinates other than WGS84, negative northings/eastings?
- ...
Importing issues
There are issues with some types of data when importing.
See Extension/importing issues.
Browser Compatibility Issues
See Extension/cross browser compatibility.
Bugs
See Extension/bugs.
Feedback
See Extension/feedback.
Our reported bugs
- Issue 439: Date values output as java.util.GregorianCalendar(...,...,...,...) instead of "2011-20-03T00:00:00Z"
- Issue 441: onError - "keep-original" / "store-blank" working oddly for value.toDate()
- Issue 442: Two column transforms to date on the same column turns the cells blank
- Issue 490: Unable to guess filetype is CSV when importing a simple CSV file
- Issue 491: Importing Excel files - blank columns and time formatting
- Issue 492: Table header cells misaligning with table cells
- Issue 493: Columnize by key/value leaving "null" columns in Refine's rowModel
- Issue 511: Refine automatically attempts to parse a string containing "E" as e notation
- Issue 545: "Clear reconciliation data" not clearing all reconciliation data
- Issue 548: escape(value, encoding) returns "null" for non-string values
