Opuscula Math. 44, no. 1 (2024), 79-103
https://doi.org/10.7494/OpMath.2024.44.1.79
Opuscula Mathematica
Conditional mean embedding and optimal feature selection via positive definite kernels
Palle E.T. Jorgensen
Myung-Sin Song
James Tian
Abstract. Motivated by applications, we consider new operator-theoretic approaches to conditional mean embedding (CME). Our present results combine a spectral analysis-based optimization scheme with the use of kernels, stochastic processes, and constructive learning algorithms. For initially given non-linear data, we consider optimization-based feature selections. This entails the use of convex sets of kernels in a construction o foptimal feature selection via regression algorithms from learning models. Thus, with initial inputs of training data (for a suitable learning algorithm), each choice of a kernel \(K\) in turn yields a variety of Hilbert spaces and realizations of features. A novel aspect of our work is the inclusion of a secondary optimization process over a specified convex set of positive definite kernels, resulting in the determination of "optimal" feature representations.
Keywords: positive-definite kernels, reproducing kernel Hilbert space, stochastic processes, frames, machine learning, embedding problems, optimization.
Mathematics Subject Classification: 47N10, 47A52, 47B32, 42A82, 42C15, 62H12, 62J07, 65J20, 68T07, 90C20.
- N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), no. 3, 337-404.
- J. Cerviño, J.A. Bazerque, M. Calvo-Fullana, A. Ribeiro, Multi-task reinforcement learning in reproducing kernel Hilbert spaces via cross-learning, IEEE Trans. Signal Process. 69 (2021), 5947-5962. https://doi.org/10.1109/TSP.2021.3122303
- N. Dunford, J.T. Schwartz, Linear Operators. Part II, Wiley Classics Library, John Wiley & Sons, Inc., New York, 1988.
- S. Grünewälder, G. Lever, L. Baldassarre, S. Patterson, A. Gretton, M. Pontil, Conditional mean embeddings as regressors, [in:] Proceedings of the 29th International Coference on International Conference on Machine Learning Omnipress, Madison, WI, USA, 2012, ICML'12, 1803-1810.
- D. He, J. Cheng, K. Xu, High-dimensional variable screening through kernel-based conditional mean dependence, J. Statist. Plann. Inference 224 (2023), 27-41. https://doi.org/10.1016/j.jspi.2022.10.002
- P. Jorgensen, F. Tian, Non-commutative Analysis, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2017.
- P. Jorgensen, F. Tian, Decomposition of Gaussian processes, and factorization of positive definite kernels, Opuscula Math. 39 (2019), no. 4, 497-541. https://doi.org/10.7494/OpMath.2019.39.4.497
- P. Jorgensen, F. Tian, Realizations and factorizations of positive definite kernels, J. Theoret. Probab. 32 (2019), no. 4, 1925-1942. https://doi.org/10.1007/s10959-018-0868-3
- P. Jorgensen, F. Tian, Sampling with positive definite kernels and an associated dichotomy, Adv. Theor. Math. Phys. 24 (2020), no. 1, 125-154. https://doi.org/10.4310/ATMP.2020.v24.n1.a4
- P. Jorgensen, F. Tian, Reproducing kernels: harmonic analysis and some of their applications, Appl. Comput. Harmon. Anal. 52 (2021), 279-302. https://doi.org/10.1016/j.acha.2020.03.001
- P. Jorgensen, F. Tian, Infinite-dimensional Analysis -- Operators in Hilbert Space; Stochastic Calculus via Representations, and Duality Theory, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2021.
- P. Jorgensen, F. Tian, Reproducing kernels and choices of associated feature spaces, in the form of \(L^2\)-spaces, J. Math. Anal. Appl. 505 (2022), no. 2, 125535. https://doi.org/10.1016/j.jmaa.2021.125535
- P.E.T. Jorgensen, M.-S. Song, J. Tian, Positive definite kernels, algorithms, frames, and approximations, (2021), arXiv:2104.11807.
- I. Klebanov, I. Schuster, T.J. Sullivan, A rigorous theory of conditional mean embeddings, SIAM J. Math. Data Sci. 2 (2020), no. 3, 583-606.
- T. Lai, Z. Zhang, Y. Wang, A kernel-based measure for conditional mean dependence, Comput. Statist. Data Anal. 160 (2021), Paper no. 107246. https://doi.org/10.1016/j.csda.2021.107246
- T. Lai, Z. Zhang, Y. Wang, L. Kong, Testing independence of functional variables by angle covariance, J. Multivariate Anal. 182 (2021), Paper no. 104711. https://doi.org/10.1016/j.jmva.2020.104711
- Y.J. Lee, C.A. Micchelli, J. Yoon, On multivariate discrete least squares, J. Approx. Theory 211 (2016), 78-84. https://doi.org/10.1016/j.jat.2016.07.005
- G. Lever, J. Shawe-Taylor, R. Stafford, C. Szepesvári, Compressed conditional mean embeddings for model-based reinforcement learning, AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016), 1779-1787.
- D.K. Lim, N.U. Rashid, J.G. Ibrahim, Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery, Ann. Appl. Stat. 15 (2021), no. 1, 481-508. https://doi.org/10.1214/20-aoas1407
- C.-K. Lu, P. Shafto, Conditional deep Gaussian processes: Multi-fidelity kernel learning, Entropy 2021, 23(11), 1545. https://doi.org/10.3390/e23111545
- E. Mehmanchi, A. Gómez, O.A. Prokopyev, Solving a class of feature selection problems via fractional 0-1 programming, Ann. Oper. Res. 303 (2021), 265-295. https://doi.org/10.1007/s10479-020-03917-w
- C.A. Micchelli, M. Pontil, Q. Wu, D.-X. Zhou, Error bounds for learning the kernel, Anal. Appl. (Singap.) 14 (2016), no. 6, 849-868.
- P. Niyogi, S. Smale, S. Weinberger, A topological view of unsupervised learning from noisy data, SIAM J. Comput. 40 (2011), no. 3, 646-663. https://doi.org/10.1137/09076293
- J. Park, K. Muandet, A measure-theoretic approach to kernel conditional mean embeddings, arXiv:2002.03689.
- S. Ray Chowdhury, R. Oliveira, F. Ramos, Active learning of conditional mean embeddings via bayesian optimisation, [in:] J. Peters, D. Sontag (eds), Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2020, volume 124 of Proceedings of Machine Learning Research, 1119--1128.
- S. Smale, Y. Yao, Online learning algorithms, Found. Comput. Math. 6 (2006), 145-170. https://doi.org/10.1007/s10208-004-0160-z
- S. Smale, D.-X. Zhou, Geometry on probability spaces, Constr. Approx. 30 (2009), 311-323. https://doi.org/10.1007/s00365-009-9070-2
- P. Xu, Y. Wang, X. Chen, Z. Tian, COKE: communication-censored decentralized kernel learning, J. Mach. Learn. Res. 22 (2021), Paper no. 196.
- Y. Zhang, Y.-C. Chen, Kernel smoothing, mean shift, and their learning theory with directional data, J. Mach. Learn. Res. 22 (2021), Paper no. 154.
- P. Zhao, L. Lai, Minimax rate optimal adaptive nearest neighbor classification and regression, IEEE Trans. Inform. Theory 67 (2021), no. 5, 3155-3182.
- Palle E.T. Jorgensen (corresponding author)
- Department of Mathematics, The University of Iowa, Iowa City, IA 52242-1419, U.S.A.
- Myung-Sin Song
- Department of Mathematics and Statistics, Southern Illinois University Edwardsville, Edwardsville, IL 62026, U.S.A.
- James Tian
- Mathematical Reviews, 416-4th Street Ann Arbor, MI 48103-4816, U.S.A.
- Communicated by P.A. Cojuhari.
- Received: 2023-06-03.
- Accepted: 2023-07-05.
- Published online: 2023-10-27.