I mainly record my experiments using images. As time progresses I often want a new or better measurements, leading me to re-analysing the data. Hence, I often wonder what is the best way to store the data, as analysing thousands of images takes some time. The requirements I found are
- Speed: It shouldn’t take me a day to get an extra measurement
- Automation: As little input needed for the analysis to happen
- Direct storage: no manual copying of data at any point
Currently, I investigate capsule travelling in channels. I send capsule down various flow geometries and take about 200-1000 images as they negotiate the obstacles.I repeat this at each flow rate between 5 to 20 times for 5 to 10 flow rates. This gives me around 50’000 images to analyse.
So far I am using the following approach (but I would love to hear from someone with a better way!). Initially I run a script that takes measurements directly from the image with little or no processing, such a s centroid position, width height, Taylor deformation parameter, area and writes it to a text file for every image in a run. Importantly all values are in pixels.
In a second step, I run a program over the text file and extract all measurements that I want for the specific run and write these to a new file. This is the results file for the whole experiment with a specific capsule. Here I add all relevant parameters, such as cut-off points in pixels. As this does not do any image processing, this is fast. I also write all parameters to a file so that I can re-run this automatically if I need to modify the way I measure something.
In a final step, I read this data into a Data-class, that automatically reads in the data and performs various steps of post-processing, fitting and plotting. At this step I add some final parameters, such as pixels/mm and the diameter of the capsule. This means it is easy to load the results of several different capsules and pass them to various plotting functions.
This means the three levels of analysis reflects the hierarchy of my data: image level – run level – experiment level.
Is there a better way?