Buildings consume quite a lot of energy; hence, the issue of building energy efficiency has attracted a great deal of attention in recent years. A key factor in achieving this objective is occupancy information that directly impacts on energy-related building control systems. In this paper, we leverage on environmental sensors that are nonintrusive and cost-effective for building occupancy estimation. Our result relies on feature engineering and learning. The conventional feature engineering requires one to manually extract relevant features without a clear guideline. This blind feature extraction is labor intensive and may miss some significant implicit features. To address this issue, we propose a convolutional deep bidirectional long short-term memory (CDBLSTM) approach that contains a convolutional network and a deep structure to automatically learn significant features from the sensory data without human intervention. Moreover, the long short-term memory networks are able to capture temporal dependencies in the data and the bidirectional structure can take the past and future contexts into consideration for the final identification of occupancy. We have conducted real experiments to evaluate the performance of our proposed CDBLSTM approach. Instead of estimating the exact number of occupants, we attempt to identify the range of occupants, i.e., zero, low, medium, and high, which is adequate for most of building control systems. The experimental results indicate the effectiveness of our proposed approach compared with the state-of-the-art methods.