The load/compare and RMW to wait_count need protection. Using atomic
operations should resolve both issues.
NOTE:
The assumption that the user will call pthread_cond_signal /
pthread_cond_broadcast with the mutex given to pthread_cond_wait held is
simply not true. It MAY hold it, but it is not forced. Thus, using the
user space lock for protecting the wait counter as well is not valid!
The pthread_cond_signal() or pthread_cond_broadcast() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behaviour is required, then that mutex is locked by the thread calling pthread_cond_signal() or pthread_cond_broadcast().
[1] https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cond_signal.html