The objective of this study was to evaluate the potential of cloud computing technology for classifying protected tomato plants under different watering treatments. Two tomato varieties, HeZuo 903 and WanShiRuYi, were used in protected cultivation for two seasons. Three water treatments were conducted (normal watering, no watering during the first fruit-swelling period, and no watering during the both fruit-swelling periods). The visible near-infrared reflection spectra of the tomato canopies were collected during the fruiting period. Three spectral datasets were used, including the original reflection spectra, the first derivative of the reflection spectra, and the absorbance spectra.
The successive projections algorithm(SPA) was used to select data from six wavebands (483, 557, 674, 783, 869, and 964 nm) as optimal wavebands. The cloud computing platform was built using the Hadoop and Spark frameworks. The MLlib machine-learning library from the Spark framework was used to build a multilayer perceptron classifier (MLPC) and one-vs.-rest classifier (ORC). These multi-class classifiers were applied to the spectral datasets (original, first derivative, and absorbance) for the two tomato varieties under different water treatments. For each classifier, 70% of the data was randomly selected for training and the remaining 30% was used for prediction.
Training and prediction were conducted on the cloud computing platform. The MLPC had better classification accuracy than the ORC. Among the three spectral datasets, the first derivative of the spectra had the best classification performance, while the reflection and absorbance spectra had similar performances. Using the full waveband spectrum provided higher classification accuracy than using only the optimal wavebands. Furthermore, the tomato canopy spectrum classification performance was better for the WanShiRuYi plants than the HeZuo 903 plants. Moreover, the collected spectral dataset was increased in size to evaluate the operating efficiency of the cloud computing platform when processing ‘big data.’
The operating efficiency was significantly improved by increasing the size of the spectral dataset or the number of nodes in the platform. Finally, the python and TensorFlow were used to implement the CNN algorithm, and conducted classification and analysis of the spectral datasets. The results showed that the MLPC and ORC algorithm had better classification performance than the CNN algorithm in classification of spectral data.
Read the complete study at www.researchgate.net.
Xia, Ji'An & Zhang, WenYu & Zhang, WeiXin & Yang, Yuwang & Hu, GuangYong & Ge, DaoKuo & Liu, Hong & Cao, Hongxin. (2021). A cloud computing-based approach using the visible near-infrared spectrum to classify greenhouse tomato plants under water stress. Computers and Electronics in Agriculture. 181. 105966. 10.1016/j.compag.2020.105966.