Power Pack: Pricing_(09)_problems_(Set1) [W-09-D]: Detecting exposure correlation
Wondering if instead of comparing each cell to the total exposures, comparing each cell to either the row total or the column total might be a better approach. In my experimenting, the patterns popped out much more clearly. I did check back to the original W&M text, but couldn't find anything specifically about detecting exposure correlation.
Comments
Can you post your calculations? I want to make sure I understand exactly what you're doing and how you're drawing your conclusions. (You can copy a group of Excel cells and just past them in this window.)
This is what I got when comparing to the column totals:
Yes, notice how in exposure distribution data set 2, the percentages show that for each level of variable 1, variable 2 is uniformly distributed, showing no exposure correlation. The results are similar for row totals:
Again, here, for each level of variable 2, variable 1 is nearly uniformly distributed, showing no significant exposure correlation between the two variables.
Ok, that might provide a faster check. Something about it bothers me though.
I haven't tried to create an example where your method wouldn't work, but I wonder if what you're doing is analogous to a univariate approach, whereas comparing each cell to the grand total of all cells is more like a multivariate approach. Comparing to grand totals, you might catch something that wouldn't manifest just by doing either row totals or column totals.
An exercise for you would be to try creating a matrix where both the row percentages and the column percentages don't show any variation separately but if you compare to the grand total, then some variation would be evident. If you can't do that then you might be able to prove mathematically that any variation that shows up by comparing to the grand total might HAVE to show up either in the row percentages or the column percentages separately.
One other comment: Real data is never as tidy as these examples. There will virtually always be variations so then you have to decide on a tolerance level. I think for the purposes of the exam, a question like this will generally have a pretty obvious answer.
Final comment! This is a fun exploratory exercise, but unless you're already feeling well-prepared for the upcoming exam, I might suggest leaving this exploration until after the exam when you're not under pressure. 🙂
Thanks, Graham. I think I better leave it for now. I was doing a little Googling and came across a Chi Square test for independence. It was familiar to me from a stats class. It involves computing expected values for each cell by applying a factor computed by dividing each row (or column) total by the grand total. The observed minus the expected squared divided by the expected is done for all the cells and then accumulated and compared to a Chi Square table to determine a confidence level for independence. But now I'm really straying from the syllabus... so, I'll just leave it at that! 😊
Good idea! You could really go down a rabbit hole here!