Wikipedia defines p-value as “the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct”. Well if we give this definition, say in a presentation to a product or a business team, you’re most probably gonna receive piercing puzzled looks. One of the major requirements for success in a data science role, is the ability to break down complex technical concepts into easily understandable information for consumption by all business stakeholders.
So let’s try again – what does p-value mean ? Since I work in the cybersecurity industry, I’m gonna pull a security related anecdote to explain this.
Say your laptop is suddenly being extremely slow and sluggish, and you are thinking various possible reasons for it. Now you highly suspect that this is due to a malware loaded onto your machine through a spam email. So your null hypothesis (default hypothesis / usual or expected behavior in most cases) is that your laptop is not infected by a malware. Now p-value is the evidence against the null hypothesis. The smaller the p-value, stronger is the evidence that your laptop’s sluggishness is due to a malware. So in this case, you would reject the null hypothesis. On the contrary, if the p-value is larger, i.e you don’t have strong evidence of malware activity because you have all the anti-malware protection in place, it’s most probably due to other reasons not related to malware.
There you go – p-value is your evidence for a particular variable being significant to an outcome (or not).