Opuscula Math. 44, no. 1 (2024), 79-103
https://doi.org/10.7494/OpMath.2024.44.1.79

 
Opuscula Mathematica

Conditional mean embedding and optimal feature selection via positive definite kernels

Palle E.T. Jorgensen
Myung-Sin Song
James Tian

Abstract. Motivated by applications, we consider new operator-theoretic approaches to conditional mean embedding (CME). Our present results combine a spectral analysis-based optimization scheme with the use of kernels, stochastic processes, and constructive learning algorithms. For initially given non-linear data, we consider optimization-based feature selections. This entails the use of convex sets of kernels in a construction o foptimal feature selection via regression algorithms from learning models. Thus, with initial inputs of training data (for a suitable learning algorithm), each choice of a kernel \(K\) in turn yields a variety of Hilbert spaces and realizations of features. A novel aspect of our work is the inclusion of a secondary optimization process over a specified convex set of positive definite kernels, resulting in the determination of "optimal" feature representations.

Keywords: positive-definite kernels, reproducing kernel Hilbert space, stochastic processes, frames, machine learning, embedding problems, optimization.

Mathematics Subject Classification: 47N10, 47A52, 47B32, 42A82, 42C15, 62H12, 62J07, 65J20, 68T07, 90C20.

Full text (pdf)

  1. N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), no. 3, 337-404.
  2. J. Cerviño, J.A. Bazerque, M. Calvo-Fullana, A. Ribeiro, Multi-task reinforcement learning in reproducing kernel Hilbert spaces via cross-learning, IEEE Trans. Signal Process. 69 (2021), 5947-5962. https://doi.org/10.1109/TSP.2021.3122303
  3. N. Dunford, J.T. Schwartz, Linear Operators. Part II, Wiley Classics Library, John Wiley & Sons, Inc., New York, 1988.
  4. S. Grünewälder, G. Lever, L. Baldassarre, S. Patterson, A. Gretton, M. Pontil, Conditional mean embeddings as regressors, [in:] Proceedings of the 29th International Coference on International Conference on Machine Learning Omnipress, Madison, WI, USA, 2012, ICML'12, 1803-1810.
  5. D. He, J. Cheng, K. Xu, High-dimensional variable screening through kernel-based conditional mean dependence, J. Statist. Plann. Inference 224 (2023), 27-41. https://doi.org/10.1016/j.jspi.2022.10.002
  6. P. Jorgensen, F. Tian, Non-commutative Analysis, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2017.
  7. P. Jorgensen, F. Tian, Decomposition of Gaussian processes, and factorization of positive definite kernels, Opuscula Math. 39 (2019), no. 4, 497-541. https://doi.org/10.7494/OpMath.2019.39.4.497
  8. P. Jorgensen, F. Tian, Realizations and factorizations of positive definite kernels, J. Theoret. Probab. 32 (2019), no. 4, 1925-1942. https://doi.org/10.1007/s10959-018-0868-3
  9. P. Jorgensen, F. Tian, Sampling with positive definite kernels and an associated dichotomy, Adv. Theor. Math. Phys. 24 (2020), no. 1, 125-154. https://doi.org/10.4310/ATMP.2020.v24.n1.a4
  10. P. Jorgensen, F. Tian, Reproducing kernels: harmonic analysis and some of their applications, Appl. Comput. Harmon. Anal. 52 (2021), 279-302. https://doi.org/10.1016/j.acha.2020.03.001
  11. P. Jorgensen, F. Tian, Infinite-dimensional Analysis -- Operators in Hilbert Space; Stochastic Calculus via Representations, and Duality Theory, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2021.
  12. P. Jorgensen, F. Tian, Reproducing kernels and choices of associated feature spaces, in the form of \(L^2\)-spaces, J. Math. Anal. Appl. 505 (2022), no. 2, 125535. https://doi.org/10.1016/j.jmaa.2021.125535
  13. P.E.T. Jorgensen, M.-S. Song, J. Tian, Positive definite kernels, algorithms, frames, and approximations, (2021), arXiv:2104.11807.
  14. I. Klebanov, I. Schuster, T.J. Sullivan, A rigorous theory of conditional mean embeddings, SIAM J. Math. Data Sci. 2 (2020), no. 3, 583-606.
  15. T. Lai, Z. Zhang, Y. Wang, A kernel-based measure for conditional mean dependence, Comput. Statist. Data Anal. 160 (2021), Paper no. 107246. https://doi.org/10.1016/j.csda.2021.107246
  16. T. Lai, Z. Zhang, Y. Wang, L. Kong, Testing independence of functional variables by angle covariance, J. Multivariate Anal. 182 (2021), Paper no. 104711. https://doi.org/10.1016/j.jmva.2020.104711
  17. Y.J. Lee, C.A. Micchelli, J. Yoon, On multivariate discrete least squares, J. Approx. Theory 211 (2016), 78-84. https://doi.org/10.1016/j.jat.2016.07.005
  18. G. Lever, J. Shawe-Taylor, R. Stafford, C. Szepesvári, Compressed conditional mean embeddings for model-based reinforcement learning, AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016), 1779-1787.
  19. D.K. Lim, N.U. Rashid, J.G. Ibrahim, Model-based feature selection and clustering of RNA-seq data for unsupervised subtype discovery, Ann. Appl. Stat. 15 (2021), no. 1, 481-508. https://doi.org/10.1214/20-aoas1407
  20. C.-K. Lu, P. Shafto, Conditional deep Gaussian processes: Multi-fidelity kernel learning, Entropy 2021, 23(11), 1545. https://doi.org/10.3390/e23111545
  21. E. Mehmanchi, A. Gómez, O.A. Prokopyev, Solving a class of feature selection problems via fractional 0-1 programming, Ann. Oper. Res. 303 (2021), 265-295. https://doi.org/10.1007/s10479-020-03917-w
  22. C.A. Micchelli, M. Pontil, Q. Wu, D.-X. Zhou, Error bounds for learning the kernel, Anal. Appl. (Singap.) 14 (2016), no. 6, 849-868.
  23. P. Niyogi, S. Smale, S. Weinberger, A topological view of unsupervised learning from noisy data, SIAM J. Comput. 40 (2011), no. 3, 646-663. https://doi.org/10.1137/09076293
  24. J. Park, K. Muandet, A measure-theoretic approach to kernel conditional mean embeddings, arXiv:2002.03689.
  25. S. Ray Chowdhury, R. Oliveira, F. Ramos, Active learning of conditional mean embeddings via bayesian optimisation, [in:] J. Peters, D. Sontag (eds), Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR, 2020, volume 124 of Proceedings of Machine Learning Research, 1119--1128.
  26. S. Smale, Y. Yao, Online learning algorithms, Found. Comput. Math. 6 (2006), 145-170. https://doi.org/10.1007/s10208-004-0160-z
  27. S. Smale, D.-X. Zhou, Geometry on probability spaces, Constr. Approx. 30 (2009), 311-323. https://doi.org/10.1007/s00365-009-9070-2
  28. P. Xu, Y. Wang, X. Chen, Z. Tian, COKE: communication-censored decentralized kernel learning, J. Mach. Learn. Res. 22 (2021), Paper no. 196.
  29. Y. Zhang, Y.-C. Chen, Kernel smoothing, mean shift, and their learning theory with directional data, J. Mach. Learn. Res. 22 (2021), Paper no. 154.
  30. P. Zhao, L. Lai, Minimax rate optimal adaptive nearest neighbor classification and regression, IEEE Trans. Inform. Theory 67 (2021), no. 5, 3155-3182.
  • Palle E.T. Jorgensen (corresponding author)
  • Department of Mathematics, The University of Iowa, Iowa City, IA 52242-1419, U.S.A.
  • Myung-Sin Song
  • Department of Mathematics and Statistics, Southern Illinois University Edwardsville, Edwardsville, IL 62026, U.S.A.
  • James Tian
  • Mathematical Reviews, 416-4th Street Ann Arbor, MI 48103-4816, U.S.A.
  • Communicated by P.A. Cojuhari.
  • Received: 2023-06-03.
  • Accepted: 2023-07-05.
  • Published online: 2023-10-27.
Opuscula Mathematica - cover

Cite this article as:
Palle E.T. Jorgensen, Myung-Sin Song, James Tian, Conditional mean embedding and optimal feature selection via positive definite kernels, Opuscula Math. 44, no. 1 (2024), 79-103, https://doi.org/10.7494/OpMath.2024.44.1.79

Download this article's citation as:
a .bib file (BibTeX),
a .ris file (RefMan),
a .enw file (EndNote)
or export to RefWorks.

We advise that this website uses cookies to help us understand how the site is used. All data is anonymized. Recent versions of popular browsers provide users with control over cookies, allowing them to set their preferences to accept or reject all cookies or specific ones.