Regression – Podcast No.874


In the Podcast on 6th September (No.874) I talked about Regression Analysis. I have written a few lines just to clarify elements I spoke about. (Hopefully).


My approach is to have a question I wish to ask the data and then use the data to answer it.

For example “How do I find more second half goals in matches so I can trade the goal markets later in the game?”

Having asked that question I then look at data I would need to answer that and it may look like this sort of layout.

First Half Goal Data – does this have any relevance on what happens in the second half, goal timings, number of goals in FH etc?

HT Score Data – Do specific HT scores have any relevance on second half goals? 0-0 for example definitely seems to have a lower second half goal output

Match Odds – Does the odds of a team have relevance to the second half goals output – i.e. Do odds on Home Teams score more SH?

Goal Line Odds – Does the Over Under markets provide any indicators to number of second half goals?

I am very much number and data driven, it does not mean I am right, other people like to watch games, have in play alerts etc. I am not saying they are right or wrong. You need to find the way that suits you and fits in with your lifestyle and mindset.

I spoke about the word Regression – Regression is defined in the dictionary as “a return to a former or less developed state”. Most gamblers will have experienced this – regression to the mean, you have a system that flies for a while and shoots up in profit and then you see it fall off a cliff, this os not uncommon it is the outcomes, regressing to the mean expectancy. This why it is so important not to dive straight in and to test forward it will save you money and angst.

Regression analysis is another completely different term and people get confused between the two, a quirk of the English language where we use the same term for completely different meanings.

Regression analysis is the process of determining the relationship of one dependent variable and one or more independent variables.

So in Football – Dependent variable – Team Winning

Independent variables may be Shots on Target – does a team having 8 shots on target have any relevance on determining at team winning. It could be possession, it could be xG, it could be how you rate a team lineup by giving it strength etc.

If we use more than one variable it is called Multiple Regression analysis.

So – What are the chances of Tottenham winning home to Chelsea? (very slim I know!)

We may use

How many goals, have Tottenham scored at home in x games

How many goals have Tottenham conceded at home in x games?

How many games have Tottenham won when priced at x at home?

How many games have Tottenham won against teams above them in the league at home in the last x games?

We could keep going adding similar data for Chelsea away

I personally like to be Home and Away specific.

We could then compare this to league averages etc and we have the basis of a model, where we can calculate all this together and from that we can generate a predicted goal number for each side and then insert this into some probability distribution analysis to provide OUR likelihood of different possible outcomes for all markets.

the choice of statistics and data you use is the key, having your own inputs for me is vital. We could all see that Tottenham’s last 6 games have gone over 2.5 goals, they are lazy stats and have no bearing on what could happen in the next game.


2 Responses

  1. The pods are great but when you have a more complex subject the written word is great too. Why? Because, otherwise, I have to jot down notes whilst you are speaking to refer back to later. This is error-prone: I mishear something, I misunderstand something, I miss an important point because I am still writing down the previous one, and because I write so quickly in order to keep up I can’t read what I have written anyway a few days later!

  2. Hi Ian – yes, this is useful for me because I prefer written information to spoken because I can go back, re-read and keep it to look back at anytime.
    I get the concept of regression and league averages, etc

