Process other response types

️✅ Learning objectives

  • Parse text responses.
  • Parse binary responses such as images and videos.
  • Handle HTTP error responses.

Content types

  • Content-Type header aka “MIME type”
    • “Multipurpose Internet Mail Extensions”
  • type/subtype;parameter=value
  • httr2::resp_content_type() gets type/subtype
  • httr2::resp_encoding() gets charset parameter
  • More at MDN MIME types
  • Even more at IANA Media Types registry

Text content types

MIME type httr2 function Description
application/json resp_body_json() By far most common
application/xml resp_body_xml() Briefly most common
text/html resp_body_html() Really a subclass of xml
text/plain resp_body_string() Text wildcard

JSON responses

  • application/json or */json
  • httr2::resp_body_json() uses jsonlite::fromJSON()

JSON data

  • 4 scalars (length-1 vectors)
    • nullNA
    • stringcharacter(1), always " (not ')
    • numbernumeric(1), no Inf/-Inf/NaN
    • booleanlogical(1), true = TRUE, false = FALSE
  • array ≈ unnamed list()
    • []: [null, "a", 1, true]list(NULL, "a", 1, TRUE)
  • object ≈ named list()
    • {}: {"a": 1, "b": [1, 2]}
      list(a = 1, b = list(1, 2))

XML responses

eXtensible Markup Language

  • application/xml, text/xml, or */xml
  • httr2::resp_body_xml() uses xml2::read_xml()

XML data

  • Tags as <tagname attribute="a">contents</tagname>
  • Everything nestable

XML example: raw

resp_xml <- req_template(request(example_url()), "/xml") |>
  req_perform()
resp_xml |> resp_body_string() |> cat()
#> <?xml version="1.0" encoding="UTF-8"?>
#> <root>
#>    <address>
#>       <city>New York</city>
#>       <postalCode>10021-3100</postalCode>
#>       <state>NY</state>
#>       <streetAddress>21 2nd Street</streetAddress>
#>    </address>
#>    <age>27</age>
#>    <children />
#>    <firstName>John</firstName>
#>    <isAlive>true</isAlive>
#>    <lastName>Smith</lastName>
#>    <phoneNumbers>
#>       <element>
#>          <number>212 555-1234</number>
#>          <type>home</type>
#>       </element>
#>       <element>
#>          <number>646 555-4567</number>
#>          <type>office</type>
#>       </element>
#>    </phoneNumbers>
#>    <spouse null="true" />
#> </root>

XML example: parsed

extracted_xml <- resp_body_xml(resp_xml)
class(extracted_xml)
#> [1] "xml_document" "xml_node"
# We'll see other ways to parse this in {rvest} chapter.
xml2::as_list(extracted_xml) |> str(max.level = 2)
#> List of 1
#>  $ root:List of 8
#>   ..$ address     :List of 4
#>   ..$ age         :List of 1
#>   ..$ children    : list()
#>   ..$ firstName   :List of 1
#>   ..$ isAlive     :List of 1
#>   ..$ lastName    :List of 1
#>   ..$ phoneNumbers:List of 2
#>   ..$ spouse      : list()
#>   .. ..- attr(*, "null")= chr "true"

HTML responses

HyperText Markup Language

  • text/html, rarely application/xhtml+xml
  • httr2::resp_body_html() uses xml2::read_html()
    • which uses xml2::read_xml(..., as_html = TRUE)

Binary objects

MIME type * examples Package
image/* png, jpeg, svg+xml {magick}
audio/* mpeg, wav, ogg {av} ?
video/* mpeg, mp4, ogg {av} ?
application/* octet-stream (catch-all), x-bzip, pdf (various)

Images

resp_body_raw(resp) |> magick::image_read()
# requires {rsvg}
resp_body_raw(resp) |> magick::image_read_svg()
# requires {pdftools}
resp_body_raw(resp) |> magick::image_read_pdf()

Video

I haven’t found anything yet for working with “raw” audio/video!

resp_body_raw(resp) |> writeBin(path)
av::av_video_convert(path, output = "output.mp4", verbose = TRUE)
ffmpeg_cmd <- glue::glue(
  "ffmpeg -v quiet ",
  "-i {input_path} ",
  "-ss {start_time} -to {end_time} ",
  "-c copy {output_path}"
)
system(ffmpeg_cmd, ignore.stdout = TRUE)

Audio

I haven’t found anything yet for working with “raw” audio/video!

resp_body_raw(resp) |> writeBin(filename)
av::av_audio_convert(
  path, output = "output.mp3", format = NULL,
  channels = NULL, sample_rate = NULL,
  start_time = NULL, total_time = NULL,
  verbose = TRUE
)

Raw data

Danger zone!

resp_body_raw(resp) |> writeBin(filename)

Base64-encoded JSON data

  • Base64 transforms binary data into text
    • 6-bit blocks ➡️ 1 of 64 characters
raw_data <- resp_body_json(resp) |>
  _$b64_json |> # Or whatever the element is named
  jsonlite::base64_dec()

magick::image_read(raw_data) # Etc

HTTP errors

HTTP status codes

Range Description Notes
1xx Informational Handled by {curl}
2xx Successful resp_*()
3xx Redirection Auto-followed
4xx Client error “Your fault”
5xx Server error “Server’s fault”

Handling errors

req_error(req, is_error = NULL, body = NULL)

  • is_error = function to identify errors
  • body = function to turn error resp into message
# Never trigger R errors
req |> 
  req_error(is_error = \(resp) FALSE)
# Only trigger R errors for "Server error" responses
req |> 
  req_error(is_error = \(resp) resp_status(resp) >= 500)
# Include information from response in error message
req |> 
  req_error(
    body = function(resp) {
      resp_body_json(resp)$error_msg # Often more complicated than this
    }
  )

Errors and pagination

  • req_perform_iterative() has on_error = c("stop", "return")
    • On R error
    • “stop” = “throw an R error if any call throws an R error”
    • “return” = “stop iterating, return everything so far”
  • req_error(is_error = \(resp) FALSE) can cause that not to trigger
  • May want to combine these for your use-case

More on retries

httr2::req_retry(
  req,
  max_tries = NULL,
  max_seconds = NULL,
  is_transient = NULL,
  backoff = NULL,
  after = NULL
)
  • is_transient = function to decide whether to retry from resp
  • backoff = function to convert tries to wait_seconds
  • after = function to convert resp to wait_seconds

Questions?