Conditional mean embedding and optimal feature selection via positive definite kernels

Palle E.T. Jorgensen; Myung-Sin Song; James Tian

doi:https://doi.org/10.7494/OpMath.2024.44.1.79

Opuscula Math. 44, no. 1 (2024), 79-103
https://doi.org/10.7494/OpMath.2024.44.1.79

Opuscula Mathematica

Conditional mean embedding and optimal feature selection via positive definite kernels

Palle E.T. Jorgensen
Myung-Sin Song
James Tian

Abstract. Motivated by applications, we consider new operator-theoretic approaches to conditional mean embedding (CME). Our present results combine a spectral analysis-based optimization scheme with the use of kernels, stochastic processes, and constructive learning algorithms. For initially given non-linear data, we consider optimization-based feature selections. This entails the use of convex sets of kernels in a construction o foptimal feature selection via regression algorithms from learning models. Thus, with initial inputs of training data (for a suitable learning algorithm), each choice of a kernel \(K\) in turn yields a variety of Hilbert spaces and realizations of features. A novel aspect of our work is the inclusion of a secondary optimization process over a specified convex set of positive definite kernels, resulting in the determination of "optimal" feature representations.

Keywords: positive-definite kernels, reproducing kernel Hilbert space, stochastic processes, frames, machine learning, embedding problems, optimization.

Mathematics Subject Classification: 47N10, 47A52, 47B32, 42A82, 42C15, 62H12, 62J07, 65J20, 68T07, 90C20.

Full text (pdf)

References

N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), no. 3, 337-404.
J. Cerviño, J.A. Bazerque, M. Calvo-Fullana, A. Ribeiro, Multi-task reinforcement learning in reproducing kernel Hilbert spaces via cross-learning, IEEE Trans. Signal Process. 69 (2021), 5947-5962. https://doi.org/10.1109/TSP.2021.3122303
N. Dunford, J.T. Schwartz, Linear Operators. Part II, Wiley Classics Library, John Wiley & Sons, Inc., New York, 1988.
S. Grünewälder, G. Lever, L. Baldassarre, S. Patterson, A. Gretton, M. Pontil, Conditional mean embeddings as regressors, [in:] Proceedings of the 29th International Coference on International Conference on Machine Learning Omnipress, Madison, WI, USA, 2012, ICML'12, 1803-1810.
D. He, J. Cheng, K. Xu, High-dimensional variable screening through kernel-based conditional mean dependence, J. Statist. Plann. Inference 224 (2023), 27-41. https://doi.org/10.1016/j.jspi.2022.10.002
P. Jorgensen, F. Tian, Non-commutative Analysis, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2017.
P. Jorgensen, F. Tian, Decomposition of Gaussian processes, and factorization of positive definite kernels, Opuscula Math. 39 (2019), no. 4, 497-541. https://doi.org/10.7494/OpMath.2019.39.4.497
P. Jorgensen, F. Tian, Realizations and factorizations of positive definite kernels, J. Theoret. Probab. 32 (2019), no. 4, 1925-1942. https://doi.org/10.1007/s10959-018-0868-3
P. Jorgensen, F. Tian, Sampling with positive definite kernels and an associated dichotomy, Adv. Theor. Math. Phys. 24 (2020), no. 1, 125-154. https://doi.org/10.4310/ATMP.2020.v24.n1.a4
P. Jorgensen, F. Tian, Reproducing kernels: harmonic analysis and some of their applications, Appl. Comput. Harmon. Anal. 52 (2021), 279-302. https://doi.org/10.1016/j.acha.2020.03.001
P. Jorgensen, F. Tian, Infinite-dimensional Analysis -- Operators in Hilbert Space; Stochastic Calculus via Representations, and Duality Theory, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2021.
P. Jorgensen, F. Tian, Reproducing kernels and choices of associated feature spaces, in the form of \(L^2\)-spaces, J. Math. Anal. Appl. 505 (2022), no. 2, 125535. https://doi.org/10.1016/j.jmaa.2021.125535
P.E.T. Jorgensen, M.-S. Song, J. Tian, Positive definite kernels, algorithms, frames, and approximations, (2021), arXiv:2104.11807.
I. Klebanov, I. Schuster, T.J. Sullivan, A rigorous theory of conditional mean embeddings, SIAM J. Math. Data Sci. 2 (2020), no. 3, 583-606.
T. Lai, Z. Zhang, Y. Wang, A kernel-based measure for conditional mean dependence, Comput. Statist. Data Anal. 160 (2021), Paper no. 107246. https://doi.org/10.1016/j.csda.2021.107246
T. Lai, Z. Zhang, Y. Wang, L. Kong, Testing independence of functional variables by angle covariance, J. Multivariate Anal. 182 (2021), Paper no. 104711. https://doi.org/10.1016/j.jmva.2020.104711
Y.J. Lee, C.A. Micchelli, J. Yoon, On multivariate discrete least squares, J. Approx. Theory 211 (2016), 78-84. https://doi.org/10.1016/j.jat.2016.07.005
G. Lever, J. Shawe-Taylor, R. Stafford, C. Szepesvári, Compressed conditional mean embeddings for model-based reinforcement learning, AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016), 1779-1787.
D.K. Lim, N.U. Rashid, J.G. Ibrahim, Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery, Ann. Appl. Stat. 15 (2021), no. 1, 481-508. https://doi.org/10.1214/20-aoas1407
C.-K. Lu, P. Shafto, Conditional deep Gaussian processes: Multi-fidelity kernel learning, Entropy 2021, 23(11), 1545. https://doi.org/10.3390/e23111545
E. Mehmanchi, A. Gómez, O.A. Prokopyev, Solving a class of feature selection problems via fractional 0-1 programming, Ann. Oper. Res. 303 (2021), 265-295. https://doi.org/10.1007/s10479-020-03917-w
C.A. Micchelli, M. Pontil, Q. Wu, D.-X. Zhou, Error bounds for learning the kernel, Anal. Appl. (Singap.) 14 (2016), no. 6, 849-868.
P. Niyogi, S. Smale, S. Weinberger, A topological view of unsupervised learning from noisy data, SIAM J. Comput. 40 (2011), no. 3, 646-663. https://doi.org/10.1137/09076293
J. Park, K. Muandet, A measure-theoretic approach to kernel conditional mean embeddings, arXiv:2002.03689.
S. Ray Chowdhury, R. Oliveira, F. Ramos, Active learning of conditional mean embeddings via bayesian optimisation, [in:] J. Peters, D. Sontag (eds), Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2020, volume 124 of Proceedings of Machine Learning Research, 1119--1128.
S. Smale, Y. Yao, Online learning algorithms, Found. Comput. Math. 6 (2006), 145-170. https://doi.org/10.1007/s10208-004-0160-z
S. Smale, D.-X. Zhou, Geometry on probability spaces, Constr. Approx. 30 (2009), 311-323. https://doi.org/10.1007/s00365-009-9070-2
P. Xu, Y. Wang, X. Chen, Z. Tian, COKE: communication-censored decentralized kernel learning, J. Mach. Learn. Res. 22 (2021), Paper no. 196.
Y. Zhang, Y.-C. Chen, Kernel smoothing, mean shift, and their learning theory with directional data, J. Mach. Learn. Res. 22 (2021), Paper no. 154.
P. Zhao, L. Lai, Minimax rate optimal adaptive nearest neighbor classification and regression, IEEE Trans. Inform. Theory 67 (2021), no. 5, 3155-3182.

About the authors

Palle E.T. Jorgensen (corresponding author)
Department of Mathematics, The University of Iowa, Iowa City, IA 52242-1419, U.S.A.

Myung-Sin Song
Department of Mathematics and Statistics, Southern Illinois University Edwardsville, Edwardsville, IL 62026, U.S.A.

James Tian
Mathematical Reviews, 416-4th Street Ann Arbor, MI 48103-4816, U.S.A.

About the article

Communicated by P.A. Cojuhari.

Received: 2023-06-03.
Accepted: 2023-07-05.
Published online: 2023-10-27.