API

class gulpio2.GulpDirectory(output_dir, jpeg_decoder=<function jpeg_bytes_to_img>)[source]

Represents a directory containing *.gulp and *.gmeta files.

Parameters
  • output_dir (str) – Path to the directory containing the files.

  • jpeg_decoder (callable that takes a JPEG stored as bytes and returns) – the desired decoded image format (e.g. np.ndarray)

all_meta_dicts

All meta dicts from all chunks as a list.

Type

list of dicts

chunk_lookup

Mapping element id to chunk index.

Type

dict: int -> str

chunk_objs_lookup

Mapping element id to chunk index.

Type

dict: int -> GulpChunk

merged_meta_dict

all meta dicts merged

Type

dict: id -> meta dict

chunks()[source]

Return a generator over existing GulpChunk objects which are ready to be opened and read from.

new_chunks(total_new_chunks)[source]

Return a generator over freshly setup GulpChunk objects which are ready to be opened and written to.

Parameters

total_new_chunks (int) – The total number of new chunks to initialize.

class gulpio2.GulpChunk(data_file_path, meta_file_path, serializer=<gulpio2.fileio.JSONSerializer object>, jpeg_decoder=<function jpeg_bytes_to_img>)[source]

Represents a gulp chunk on disk.

Parameters
  • data_file_path (str) – Path to the *.gulp file.

  • meta_file_path (str) – Path to the *.gmeta file.

  • serializer (subclass of AbstractSerializer) – The type of serializer to use.

  • jpeg_decoder (callable that takes a JPEG stored as bytes and returns) – the desired decoded image format (e.g. np.ndarray)

append(id_, meta_data, frames)[source]

Append an item to the gulp.

Parameters
  • id (str) – The ID of the item

  • meta_data (dict) – The meta-data associated with the item.

  • frames (list of numpy arrays) – The frames of the item as a list of numpy dictionaries consisting of image pixel values.

flush()[source]

Flush all buffers and write the meta file.

iter_all(accepted_ids=None, shuffle=False)[source]

Iterate over all frames in the gulp.

Parameters
  • accepted_ids (list of str) – A filter for accepted ids.

  • shuffle (bool) – Shuffle the items or not.

Returns

An iterator that yield a series of frames,meta tuples. See read_frames for details.

Return type

iterator

open(flag='rb')[source]

Open the gulp chunk for reading.

Parameters

flag (str) – ‘rb’: Read binary ‘wb’: Write binary ‘ab’: Append to binary

Notes

Works as a context manager but returns None.

read_frames(id_, slice_=None)[source]

Read frames for a single item.

Parameters
  • id (str) – The ID of the item

  • slice (slice or list of ints:) – A slice or list of indices with which to select frames.

Returns

The frames of the item as a list of numpy arrays consisting of image pixel values. And the metadata.

Return type

frames (int), meta(dict)

class gulpio2.GulpIngestor(adapter, output_folder, videos_per_chunk, num_workers)[source]

Ingest items from an adapter into an gulp chunks.

Parameters
  • adapter (subclass of AbstractDatasetAdapter) – The adapter to ingest from.

  • output_folder (str) – The folder/directory to write to.

  • videos_per_chunk (int) – The total number of items per chunk.

  • num_workers (int) – The level of parallelism.

class gulpio2.ChunkWriter(adapter)[source]

Can write from an adapter to a gulp chunk.

Parameters

adapter (subclass of AbstractDatasetAdapter) – The adapter to get items from.

write_chunk(output_chunk, input_slice)[source]

Write from an input slice in the adapter to an output chunk.

Parameters
  • output_chunk (GulpChunk) – The chunk to write to

  • input_slice (slice) – The slice to use from the adapter.