Activity identification is an essential step to measure and monitor the performance of earthmoving operations. Many vision-based methods that automatically capture and explain activity information from image data have been developed with economic advantages and analysis efficiency. However, the previous methods failed to consider the interactive operations among equipment, and thus limited the applicability to the operation time estimation for productivity analysis. To address the drawback, this research developed a vision-based activity identification framework that incorporates interactive aspects of earthmoving equipment’s operation. This framework included four main processes: equipment tracking, action recognition of individual equipment, interaction analysis, and post-processing. The interactions between excavators and dump trucks were examined due to its significant impacts on earthmoving operations. TLD (Tracking-Learning-Detection) was adapted to track the heavy equipment. Spatio-temporal reasoning and image differencing techniques were then implemented to categorize individual actions. Third, interactions were interpreted based on a knowledge-based system that evaluates equipment actions and proximity between operating equipment. Lastly, outliers or noisy results were filtered out considering work continuity. To validate the proposed framework, two experiments were performed: one with the interaction analysis and the other without the analysis. 11,513 image frames from actual earthmoving sites in total were tested. The consequent average precision of activity analysis was enhanced from 75.68% to 91.27% after the interaction analysis was applied. In conclusion, this research contributes to identifying critical elements that explain interactive operations, characterize the vision-based activity identification framework, and improve the applicability of the vision-based method for the automated equipment operations analysis.