This page gives a introduction on how to get started with ImageDataExtractor. This assumes you already have ImageDataExtractor and its requirements installed.
>>> import imagedataextractor as ide
ImageDataExtractor can be used to extract information from images directly. Conversely, microscopy images can be automatically identified and extracted from HTML or XML documents, followed by particle extraction with ImageDataExtractor. The latter requires ChemDataExtractor to be installed.
You can view the example usage notebook here.
Simply provide as input a path to an image or a document, or a path to a directory of images and/or documents, as well as an output directory which specifies where you would like the results to be written to. If the input image is a figure containing a panel of images, these will be split and extraction will be performed on each sub-image separately.
>>> data = ide.extract(input_path)
This will return a list of
EMData objects, each of which contains the image, resulting segmentation, uncertainty, scalebar information and extracted quantitative data for each detected particle.
The resulting segmentation and its uncertainty can be accessed by
>>> seg = data.segmentation >>> uncertainty = data.uncertainty
You can obtain a pandas
DataFrame containing all extracted data from an
>>> df = data.to_pandas()
Extracted scalebar information can be accessed from the
scalebar attribute of an
>>> sb_text = data.scalebar.text >>> conversion = data.scalebar.conversion >>> units = data.scalebar.units >>> sb_contours = data.scalebar.scalebar_contour
And that's it!
The segmentation model can be adjusted using the
seg keyword arguments of
>>> data = ide.extract(input_path, seg_bayesian=True, seg_tu=0.0125, seg_n_samples=30, seg_device='cpu')
For optimal performance, particle segmentation is performed using Bayesian inference by default. Segmentation can be performed discriminatively, although this is not recommended, due to the significant accuracy and precision gains afforded by the Bayesian version. Setting the
seg_bayesian argument to
True will allow the segmentation model to run in the recommended Bayesian-mode. The default is
False positives are filtered automatically using the uncertainties afforded by Bayesian inference. The threshold beyond which particles are filtered can be adjusted using the
seg_tu parameter. The default is
Performing Bayesian inference by Monte Carlo sampling slows down the extraction process noticeably. The number of Monte Carlo samples used in inference can be set using the
seg_n_samples argument. The default is
Extraction can be accelerated by utilising a Graphics Processing Unit (GPU). Specifying the
device argument as
'cuda' allows particle segmentation to be performed on a GPU, if one is available. This can speed up extraction significantly, particularly if extraction is being run in Bayesian mode. The default is