Convert a number of files to a single array store#
In the previous notebooks, we’ve seen how to incrementally create a dataset and train models on it.
Once we have a dataset of validated files, we might want to create them into one big array store.
This is what CellxGene team did for the data in the CellxGene portal: a high number of h5ad files were concatenated to give rise to a single array store.
This requires duplicating the data that’s present in a collection of .h5ad
files, but provides the advantage that one can now query slices for arbitrary metadata, rather than just the individual files.
See how this looks for cellxgene-census
here: cellxgene-census.