Quickly parse API responses

TODO

  • Merge back into “Get started with APIs”?

️✅ Learning objectives

  • Parse nested lists with {tibblify}.
  • Use an API’s OpenAPI description to determine the expected format of responses.
  • Parse API responses with {tibblify} and the response description.
library(jsonlite)
library(tidyr)
library(dplyr)
library(tibblify)
library(yaml)
library(waldo)

Aside: Why rectangle?

  • Apps usually “think” in objects
  • Data scientists usually compare many objects at once
    • Preferred data: data frames
      • Columns of variables (of same class)
      • Rows of observations (~objects)
  • APIs are usually designed for the first model
  • Even data APIs tend to “think” in objects
    • Because most programmers do

The tibblify package

Typical JSON data

demo_json.json

url <- "https://dslc-io.github.io/club-wapir/slides/schemas/demo_json.json"
demo_json <- jsonlite::fromJSON(url)

Rectangling demo_json manually

demo_json |> 
  tibble::enframe(name = "api_id") |>
  tidyr::unnest_longer(value, indices_include = FALSE) |> 
  tidyr::unnest_longer(value, indices_to = "version") |>
  tidyr::unnest_wider(value)
#> # A tibble: 3 × 8
#>   api_id        added updated swaggerUrl swaggerYamlUrl openapiVer link  version
#>   <chr>         <chr> <chr>   <chr>      <chr>          <chr>      <chr> <chr>  
#> 1 apis.guru     2015… 2023-0… https://a… https://api.a… 3.0.0      http… 2.2.0  
#> 2 fec.gov       2018… 2023-0… https://a… https://api.a… 3.0.0      http… 1.0    
#> 3 googleapis.c… 2020… 2023-0… https://a… https://api.a… 3.0.0      http… v3

Introducing tibblify

  • {tibblify} 📦 to auto-convert hierarchical data to tibbles
  • Super-charged tidyr::unnest_auto()
  • Tibbles all the way down
  • Experimental functionality for APIs

Rectangling demo_json with tibblify

dj_tibblified <- tibblify::tibblify(demo_json)
dj_tibblified
#> # A tibble: 3 × 2
#>   .names                           versions
#>   <chr>                  <list<tibble[,7]>>
#> 1 apis.guru                         [1 × 7]
#> 2 fec.gov                           [1 × 7]
#> 3 googleapis.com:youtube            [1 × 7]
dj_tibblified |> 
  dplyr::rename(api_id = ".names") |> 
  tidyr::unnest(versions)
#> # A tibble: 3 × 8
#>   api_id         .names added updated swaggerUrl swaggerYamlUrl openapiVer link 
#>   <chr>          <chr>  <chr> <chr>   <chr>      <chr>          <chr>      <chr>
#> 1 apis.guru      2.2.0  2015… 2023-0… https://a… https://api.a… 3.0.0      http…
#> 2 fec.gov        1.0    2018… 2023-0… https://a… https://api.a… 3.0.0      http…
#> 3 googleapis.co… v3     2020… 2023-0… https://a… https://api.a… 3.0.0      http…

Rectangling manually vs tibblify

dj_tidyr <- 
  demo_json |> 
  tibble::enframe(name = "api_id") |>
  tidyr::unnest_longer(
    value, indices_include = FALSE
  ) |> 
  tidyr::unnest_longer(
    value, indices_to = "version"
  ) |>
  tidyr::unnest_wider(value)
dj_tibblify <- 
  demo_json |>
  tibblify::tibblify() |> 
  dplyr::rename(api_id = ".names") |> 
  tidyr::unnest(versions) |> 
  dplyr::rename(version = ".names")


waldo::compare(
  dj_tibblify, dj_tidyr, 
  list_as_map = TRUE # Ignore column order
)
#> ✔ No differences

tspec_guess()

  • tibblify::tibblify() spec argument
    • “What should this look like?”
  • Guessed by default with tibblify::guess_tspec()
tibblify::guess_tspec(demo_json)
#> tspec_df(
#>   .names_to = ".names",
#>   tib_df(
#>     "versions",
#>     .names_to = ".names",
#>     tib_chr("added"),
#>     tib_chr("updated"),
#>     tib_chr("swaggerUrl"),
#>     tib_chr("swaggerYamlUrl"),
#>     tib_chr("openapiVer"),
#>     tib_chr("link"),
#>   ),
#> )

The OpenAPI Specification

Multiple Standards

  • Swagger 2.0 ➡️ OpenAPI 2.0
  • OpenAPI 3.x
  • OpenAPI 4.0 (in development)
  • Postman Collection
  • API Blueprint
  • Web Application Description Language (WADL)

YAML

JSON
(jsonlite::read_json())

{
  "info": {
    "contact": {
      "email": "mike.ralphson@gmail.com",
      "name": "APIs.guru",
      "url": "https://APIs.guru"
    },
    "title": "APIs.guru",
    "version": "2.2.0",
    "x-apisguru-categories": [
      "open_data",
      "developer_tools"
    ]
  },
    "security": []
}

YAML
(yaml::read_yaml())

info:
  contact:
    email: mike.ralphson@gmail.com
    name: APIs.guru
    url: https://APIs.guru
  title: APIs.guru
  version: 2.2.0
  x-apisguru-categories:
    - open_data
    - developer_tools
security: [] # No security needed

Exploring an API Description

openapi: 3.0.0 # Specification version number
info: # title, version, description, contact, license
servers: # One or more URLs + optional descriptions
tags: # Optional name, description, externalDocs of endpoint categories
externalDocs: # Optional url & description of additional documentation
security: # Optional list of named default security schemes
paths: # Endpoints of the API
webhooks: # Description of endpoints YOU can specify for API to SEND to
jsonSchemaDialect: # URI to extend components/schemas
x-whatever: # Extend with additional properties
components: # Reusable schemas, securitySchemes, reusable pieces of everything above

apis.guru APIs Schema

components:
  schemas:
    APIs:
      additionalProperties:
        $ref: "#/components/schemas/API"
      description: |
        List of API details.
        It is a JSON object with API IDs(`<provider>[:<service>]`) as keys.
      minProperties: 1
      type: object
tspec_apis <- tspec_df(
  .names_to = "api_id",
  tspec_api
)

apis.guru API Schema

components:
  schemas:
    API:
      additionalProperties: false
      description: Meta information about API
      properties:
        versions:
          additionalProperties:
            $ref: "#/components/schemas/ApiVersion"
          description: List of supported versions of the API
          minProperties: 1
          type: object
      required:
        - versions
      type: object
tspec_api <- tspec_row(
  tib_df(
    "versions", 
    .names_to = "version", 
    tspec_api_version
  )
)

apis.guru ApiVersion Schema

components:
  schemas:
    ApiVersion:
      additionalProperties: false
      properties:
        added:
          description: Timestamp when the version was added
          format: date-time
          type: string
        link:
          description: Link to the individual API entry for this API
          format: url
          type: string
        openapiVer:
          description: The value of the `openapi` or `swagger` property of the source definition
          type: string
        swaggerUrl:
          description: URL to OpenAPI definition in JSON format
          format: url
          type: string
        swaggerYamlUrl:
          description: URL to OpenAPI definition in YAML format
          format: url
          type: string
        updated:
          description: Timestamp when the version was updated
          format: date-time
          type: string
      required:
        - added
        - updated
        - swaggerUrl
        - swaggerYamlUrl
        - openapiVer
      type: object

apis.guru ApiVersion tspec

tib_chr_datetime <- function(key, ..., required = TRUE) {
  tibblify::tib_scalar(
    key = key,
    ptype = vctrs::new_datetime(tzone = "UTC"),
    required = required,
    ptype_inner = character(),
    transform = \(x) as.POSIXct(x, format = "%Y-%m-%dT%H:%M:%OSZ", tz = "UTC"),
    ...
  )
}
tspec_api_version <- tspec_row(
  tib_chr_datetime("added"),
  tib_chr_datetime("updated"),
  tib_chr("openapiVer"),
  tib_chr("swaggerUrl"),
  tib_chr("swaggerYamlUrl"),
  tib_chr("link", required = FALSE)
)

Using tspecs: version

tibblify(demo_json$apis.guru$versions$`2.2.0`, tspec_api_version)
#> # A tibble: 1 × 6
#>   added               updated             openapiVer swaggerUrl   swaggerYamlUrl
#>   <dttm>              <dttm>              <chr>      <chr>        <chr>         
#> 1 2015-11-26 17:52:26 2023-04-05 13:10:14 3.0.0      https://api… https://api.a…
#> # ℹ 1 more variable: link <chr>

Using tspecs: api

tibblify(demo_json$apis.guru, tspec_api) |> tidyr::unnest(versions)
#> # A tibble: 1 × 7
#>   version added               updated             openapiVer swaggerUrl         
#>   <chr>   <dttm>              <dttm>              <chr>      <chr>              
#> 1 2.2.0   2015-11-26 17:52:26 2023-04-05 13:10:14 3.0.0      https://api.apis.g…
#> # ℹ 2 more variables: swaggerYamlUrl <chr>, link <chr>

Using tspecs: apis

tibblify(demo_json, tspec_apis) |> tidyr::unnest(versions)
#> # A tibble: 3 × 8
#>   api_id   version added               updated             openapiVer swaggerUrl
#>   <chr>    <chr>   <dttm>              <dttm>              <chr>      <chr>     
#> 1 apis.gu… 2.2.0   2015-11-26 17:52:26 2023-04-05 13:10:14 3.0.0      https://a…
#> 2 fec.gov  1.0     2018-11-20 00:04:28 2023-03-06 07:12:59 3.0.0      https://a…
#> 3 googlea… v3      2020-11-02 10:32:34 2023-04-21 23:09:23 3.0.0      https://a…
#> # ℹ 2 more variables: swaggerYamlUrl <chr>, link <chr>