Buxton R, Krouse D, Massiot C, Lawrence MJF. 2021. Interpreting borehole data with machine learning: a pilot study. Lower Hutt, (NZ): GNS Science. 32 p. (GNS Science report; 2020/30). doi:10.21420/XEFZ-KG47.
Classifying and interpreting multi-sensor geophysical borehole (wireline) data is normally undertaken manually by a geologist and is extremely resource intensive, both in terms of skilled people and time. Machine learning techniques have been applied to wireline data overseas; however, those techniques are not necessarily directly applicable to New Zealand sedimentary rocks and little investigation has been done in geothermal settings. This pilot study aims to investigate machine learning algorithms and approaches to allow the automated processing of borehole data that can be applied to resource assessments in New Zealand. A series of supervised (with some level of prior interpretation; labelling) and unsupervised (no prior interpretation) have been tested. Unsupervised machine learning techniques would be most desirable. Two publicly available wireline datasets that have been interpreted manually were used as a basis for the pilot study. The first is from Hole U1530A on the NW Caldera of Brothers Volcano, which intersected a sequence of lavas and pyroclastic rocks with hydrothermal alteration of varied mineralogy and intensity. The second dataset comprises a marine succession of mudstones to sandstones and shallow non-tropical carbonates in a well (Kauhauroa-5) drilled onshore on the east coast of the North Island of New Zealand. Based on the first-pass results of this pilot study, some of the more common unsupervised pattern recognition approaches struggle to attain the level of performance that is required to reduce the workload involved in manually classifying borehole data. Supervised approaches show that there are definitely patterns in the data that can be recognised by ML/AI algorithms. Self Organising Maps with semi-supervised learning provided some encouraging results on one of the datasets but could not duplicate it on the second. Similarly, generative modelling may provide some advantages but requires more study. Lastly, an approach similar to the MIR max algorithm, utilising Information Theory as a similarity metric, may provide a more reliable unsupervised approach. With appropriate custom pre-processing, the segmentation method can be applied to a wide variety of data, such as decomposing wireline logs into specified shapes (e.g. ‘bell’, ‘funnel’) used by geologists. Overall, the study has indicated a number of avenues for future research. (auth)