The v1 DMAPI will be deprecated on 1/1/2016. If you have questions or need assistance migrating to the latest version please contact us.

The SpatialKey team has recognized that creating the XML descriptor file is the most tedious and error prone aspects of the Data Import API.  To help alleviate this we have added a way to export the XML descriptor for a previously uploaded dataset (NOTE: you must be a dataset Creator/Editor to use this feature).

Generating the XML File

To easily generate an XML file, start by importing a CSV dataset into SpatialKey through the client application.  This CSV should be representative of the data you wish to import using the Data Import API.  This will serve two purposes:

  • First, it will allow the user to import the dataset using the tools within SpatialKey (setting appropriate data types, validating that the data look correct, etc.) before trying it through the API.
  • Second, after importing a dataset into SpatialKey, generating the XML descriptor is just a few clicks away and removes the possibility of typos or mistakes.

Once the dataset is imported into SpatialKey, find your dataset in the Manage Data tab, click the gear icon to view data settings, and select the “Advanced Settings” option.  Within the advanced settings, there is an option to “Generate API v2 Config” within the “Data Import API” tab.

Screen Shot 2015-07-15 at 10.45.05 AM

 

The following information is contained within the XML:

  • Dataset name
  • Geo-location information
    • Geocoding information (which columns should be used for geocoding)
    • Existing latitude and longitude fields
  • Column names
  • Data types of each column

You will need to create one XML file for each dataset that is managed through the API.  Once an XML file is created for a specific dataset, you should be able to save and reuse the XML for each update through the API.

Geocoding Instructions in XML

You must define the georeference method in your XML file – this can either be “geocode” or “trust”.  Alternatively, if your data does not contain location information you can use an empty <georeference/> in the XML.

“Geocode” georeference method

Use the “geocode” method when you have address information and would like SpatialKey to geocode your data for you.  Define as much address information as possible when using this method.  The field elements (streetField, cityField, stateField, etc.) should be empty or contain the column name (header name in the CSV file) of the corresponding data.  Note that it is not required to fill in each field value.  Only use what fields you have access to, although the more information available in a record, the more accurate the geocoding will be.  The value elements are optional and give the API user the ability to “hard code” a specific value for that field type.  For example, if your data does not contain state information, but you know that all data within the dataset is within the state of California you could simply place the word California, or CA in the stateValue field.  The geocoding process would then use that data for each row as it gets geocoded.  The field and value pairs are mutually exclusive of each other and the field version will take precedence if both are specified.

Example using geocoding (city and state fields) to get latitude and longitudes for the data:

<?xml version="1.0" encoding="UTF-8"?>
<umgMeta version="1.0" purpose="for importing">
  <dataset>

    <name>TechCrunch</name>
    <georeference method="geocode">
      <streetField></streetField>
      <streetValue></streetValue>
      <cityField>city</cityField>

      <cityValue></cityValue>
      <countyField></countyField>
      <countyValue></countyValue>
      <stateField>state</stateField>

      <stateValue></stateValue>
      <postalCodeField></postalCodeField>
      <postalCodeValue></postalCodeValue>
      <countryField></countryField>
      <countryValue></countryValue>

    </georeference>
  </dataset>
  <columns>
  <column>
    <position>1</position>

    <name>permalink</name>
    <type confidence="100">STRING</type>
  </column>
  <column>
    <position>2</position>

    <name>company</name>
    <type confidence="100">STRING</type>
  </column>
  <column>
    <position>3</position>

    <name>numemps</name>
    <type confidence="100">NUMBER/INTEGER</type>
  </column>
  <column>
    <position>4</position>

    <name>category</name>
    <type confidence="100">STRING</type>
  </column>
  <column>
    <position>5</position>

    <name>city</name>
    <type confidence="100">STRING/CITY</type>
  </column>
  <column>
    <position>6</position>

    <name>state</name>
    <type confidence="100">STRING/STATE</type>
  </column>
  <column>
    <position>7</position>

    <name>fundeddate</name>
    <type confidence="100">DATE/DAY</type>
  </column>
  <column>
    <position>8</position>

    <name>raisedamt</name>
    <type confidence="100">NUMBER/INTEGER</type>
  </column>
  <column>
    <position>9</position>

    <name>raisedcurrency</name>
    <type confidence="100">STRING</type>
  </column>
  <column>
    <position>10</position>

    <name>round</name>
    <type confidence="100">STRING</type>
  </column>
  </columns>
</umgMeta>

“Trust” georeference method

Use the “trust” method when your file contains latitude and longitude coordinates and geocoding in SpatialKey isn’t necessary.  Both latitude and longitude fields are required to be defined when using this method.

Example using existing latitude and longitude in the source data:

<?xml version="1.0" encoding="UTF-8"?>
<umgMeta version="1.0" purpose="for publishing">
  <dataset>
    <name>SalesData</name>

    <georeference method="trust">
      <latitudeField>Latitude</latitudeField>
      <longitudeField>Longitude</longitudeField>
    </georeference>
  </dataset>

  <columns>
    <column>
      <position>1</position>
      <name>Transaction_date</name>
      <type confidence="100">DATE/MINUTE</type>

    </column>
    <column>
      <position>2</position>
      <name>Product</name>
      <type confidence="100">STRING</type>

    </column>
    <column>
      <position>3</position>
      <name>Price</name>
      <type confidence="100">NUMBER/INTEGER</type>

    </column>
    <column>
      <position>4</position>
      <name>Payment_Type</name>
      <type confidence="100">STRING</type>

    </column>
    <column>
      <position>5</position>
      <name>Name</name>
      <type confidence="100">STRING</type>

    </column>
    <column>
      <position>6</position>
      <name>Latitude</name>
      <type confidence="100">LATITUDE</type>

    </column>
    <column>
      <position>7</position>
      <name>Longitude</name>
      <type confidence="100">LONGITUDE</type>

    </column>
  </columns>
</umgMeta>

Empty georeference

For data where no location information is available, use an empty <georeference/> in the XML.

Example with no location information:

<?xml version="1.0" encoding="UTF-8"?>
<umgMeta version="1.0" purpose="for publishing">
  <dataset>
    <name>SalesData</name>

    <georeference/>
  </dataset>

  <columns>
    <column>
      <position>1</position>
      <name>Column1</name>
      <type confidence="100">DATE/MINUTE</type>

    </column>
    <column>
      <position>2</position>
      <name>Column2</name>
      <type confidence="100">STRING</type>

    </column>
  </columns>
</umgMeta>

XML Data Requirements

Each column in your CSV should have a column element in the XML file defining its name, position and data type.

Sample column element:

<column>
    <position>1</position>
    <name>permalink</name>
    <type confidence="100">STRING</type>
</column>

Each column will contain a <position>, <name>, and <type> element. The name should match with the column header in the CSV file.  In a similar fashion the position should correspond to the order of the columns in the CSV.  The numbering for position should start at 1 and count up to the total number of columns.  The last element is the data type.  Data types in SpatialKey are set up in a Data Type / Subtype fashion.  An example of how this looks in the XML file would be something like: NUMBER/INTEGER.  The first value (before the “/”) is the general data type. In this case a number.  The second value tells us more about that data.  In the example we have INTEGER telling the system that the data is not only a number but it contains no decimal places.

There are many different type/granularity combinations for your data. They are presented here with a definition of what each means:

  1. STRING – simple text, no granularity defined
  2. STRING/IPADDRESS – an ip address
  3. STRING/STREET – a street address
  4. STRING/CITY – a city
  5. STRING/STATE – a state (could be abbreviation, full state name, etc.)
  6. STRING/ZIP – zip or postal code (treated as text)
  7. STRING/COUNTY – a county
  8. STRING/COUNTRY – a country
  9. BOOLEAN – for columns with only two values (typically yes/no, 1/0, etc. but could be others)
  10. NUMBER – generic number, will be stored as a decimal, should use a granularity below
  11. NUMBER/INTEGER – number with no decimal places
  12. NUMBER/DECIMAL – number containing decimal data
  13. NUMBER/CURRENCY – number with possible currency formatting
  14. NUMBER/HOURS – a number representing a number of hours
  15. NUMBER/MINUTES – a number representing a number of minutes
  16. NUMBER/SECONDS – a number representing a number of seconds
  17. NUMBER/MILLISECONDS – a number representing a number of miliseconds
  18. DATE/YEAR – date containing only a relevant year
  19. DATE/MONTH – a date accurate/relevant to the month
  20. DATE/DAY – a date
  21. DATE/HOUR – a date and time accurate to the hour
  22. DATE/MINUTE – a date and time accurate to the minute
  23. DATE/SECOND – a date and time accurate to the second
  24. LATITUDE – a latitude field
  25. LONGITUDE – a longitude field