Active travel, namely walking and cycling, is an eco-friendly and socially beneficial mode of sustainable transportation. However, existing research on active travel relies on limited survey data and generalized linear models. To fill the gap, our study integrates large-scale big trip data and data-driven machine learning to simultaneously predict active travel flow and probability. We employ SHapley Additive exPlanation to analyze the nonlinear effects of various characteristics (e.g., travel, socioeconomic, infrastructure, environment) on active travel. Gradient Boosting Decision Tree performs best for both prediction tasks. The overall importance of travel distance is over 50% to the model. Features like crow-fly distance, housing price, point-of-interest density, subway proximity, building area/road density, and urban greenery exhibit pronounced nonlinear effects. Local interpretability analysis reveals the determinants of specific trips, facilitating targeted optimization implications. Our study reveals the drivers and nonlinearities of active travel behavior and aids sustainable transportation planning.