Tripod – Detect cars

3rd draft
Stijn Oomes
Wed 30 May 2018

This is the design of the 3D vision API for the ParkSpot system. It takes camera parameters and images from the camera as input. It detects the cars in view of the camera and gives their size, position, and attitude as output.

This page describes the overall design and the main assumptions. What will it do and what will it not do?

This API uses the Tripod framework, a 3D vision engine written in Swift.

The final goal is to run it on a Raspberry Pi directly connected to a camera. The idea is that no privacy sensitive information can leave this system, only an abstract description of where the cars are in the parking space.

As a temporary solution it will run on a server using Vapor, a web framework for Swift

Coordinate system

We define a coordinate system relative to the parking space. All distances are in meters.

So we do not use GPS coordinates (longitude, latitude) because it is easier to represent the cars in a local coordinate system aligned with the street or parking lot.

SceneKit coordinate system

We use the axes convention used by many 3D graphics frameworks (the illustration is taken from the SceneKit documentation). The x- and z-axis determine the position of the car on the ground. The y-direction is the height.

It seems natural to me to define the camera above the origin at (0,h,0) with height h.

ParkingLot

If the camera is mounted on a wall it seems natural to define the x-axis parallel to that wall. If it is a rectangular parking lot, the x-axis can be defined along the longest length.

Input

  • Camera
    Camera(name: String, position: Position3D, attitude: Attitude3D, resolution: CGSize, focalLength: CGFloat)

    • name
    • position
      The point (0,h,0) is probably best, so only camera height is needed.
    • attitude
      The easiest way to represent this is to give the coordinates of the point that the centre of the image is looking at [suggestions?]
    • image resolution: width, height
    • focal length (in pixels)
  • Camera images
    • frame
      The jpg image format is probably the best choice.
    • timestamp

Output

  • 3D Size
    • Size3D(a: Double, b: Double, c: Double)
    • dimensions of the bounding box of the car: a, b, c (with a > b > c))
  • 3D Position
    • Position3D(x: Double, y: Double, z: Double)
    • coordinates of the car on the ground: (x, 0, z)
    • coordinates of the centre of the car: (x, c/2, z)
  • 3D Attitude
    • Attitude3D(d1: Direction3D, d2: Direction3D, d3: Direction3D)
    • with Direction3D(x: Double, y: Double, z: Double) as normalized vectors
    • orientation vector d1 parallel to the ground along the long axis a of the car
    • orientation vector d3 perpendicular to the ground

Discussion

So this API is not calculating the parking places itself, it only gives the relevant attributes of the cars that take up space. My thought is that determining the available parking space needs more contextual knowledge that should be determined elsewhere in the system.