I have been given the assignment of investigating how students perform in their driving test. I will look at different factors and see what effect they have on the number of mistakes the driver makes. The data of 240 records, subdivided into various categories (shown below), will be analysed.1) Driver: The drivers name is not given- instead the name is replaced by a letter. This hides the identity of the student.2) Gender: Shows if the driver is male or female. I could use the data to find out if males are better drivers than females.3) Number of Lessons: I will investigate the number of lessons needed to pass the test. There is a big range in the data. This could be due to some students having extra lessons from friends and parents etc. Also some people may be lacking in self-assurance, which could effect their concentration while driving.4) Number of Mistakes: A pupil has failed if they have made over 15 mistakes.5) Instructor: Instructor names have been given letters A, B, C, and D to hide their identity. From the data it appears that instructors A and B have the most students than C and D. This is could be because instructors A and B are more popular e.g. Instructor A and B may be better teachers than C and D.6) Time of Test: Timing of a test may have an effect how a student performs e.g. rush hour is between 9 – 5.7) Day of test: The day the student takes their examination can affect their performance. E.g. busy roads Mon – Fri.Firstly, I will tally up the amount of male and female pupils for each instructor.InstructorMaleFemaleTotalA293160B4951100C182240D202040Because of the large amount of statistics, I will take a sample of the data to make calculations easier to manage. The sample should represent the complete data set, so I will take a sample of 60 (a quarter of 240). I will ensure that the proportion from each instructor and of each gender is the same as complete data set.InstructorMaleFemaleTotalA7815B121325C5510D5510I will use random numbers from the data to choose the sample. The data will be stratified by gender (male or female) and instructor (A, B, C &D).If one of the records that I choose is incomplete, I will choose the nearest record ensuring instructor and gender are the same.Checking the DataI will now check that my sample is good a representation of my full data set. I will compare the two data sets by using box plots. By using box plots you can easily see the difference between the two sets of data. E.g. box plots show the quartiles, median, maximum & minimum values; hence I will be able to see how accurate my sample data is compared to the full data. I will draw two sets of box plots for:a) Number of lessonsb) Number of mistakesI will then compare the sample with the complete set of data to make sure that it is representative.Box plot (a)-Number of LessonsThe table below is comparing the figures between the complete data and the sample from the “number of lessons” box plot.All dataSampleMedian2221.5Lower Quartile1615.5Upper Quartile3029IQR1413.5Range3532SkewerSlightly PositiveSlightly PositiveAll the results are very close, so I am satisfied that this is a good match.Box plot (b) – Number of MistakesAll dataSampleMedian1619Lower Quartile911Upper Quartile2424.5IQR1513.5Range3636SkewersymmetricalSlightly negativeAgain the results are very close.I am confident that the sample matches up to the complete data. I will now continue with the investigation.Hypothesis 1I will investigate the hypothesis:”The more lessons taken by a pupil the fewer mistakes they make in the test.’I predict that this hypothesis is correct. This is because I think this because the moreyou practice at something, generally the better you become at it hence the saying practice makes perfect. Therefore I anticipate that there is a negative correlation between number of lessons and number of mistakes.To investigate this hypothesis, I will draw a scatter graph for the number of mistakes against the number of lessonsScatter graph analysis:The scatter graph shows a negative correlation between the number of mistakes and the number of lessons.The equation of the line is y=-0.2118x+22.694.The gradient (-0.2118 or m) means that for every 5 lessons taken the driver will make 1 less mistake (reducing 0.2118 of a mistake every lesson).To find the how many lessons required to pass the test, I use the formula:Y = mx + c15 = -0.2118x+22.6940.2118x = 22.694- 15x = 7.964/0.2118x = 37.60This shows that generally speaking 38 lessons are going to be required in order to pass the test.By using the intercept (c), I can estimate how many mistakes the average driver makes without any lessons. This means that the average person with no lessons makes 23 mistakes. To pass the test, the number of mistakes would need to be reduced by eight, hence -P = 8×0.2P = 40It would take on average 40 lessons for a student to pass their test. This indicates that lessons are nearly not worth while because lessons can be about ï¿½50 each. It would cost the average person learning to drive, about ï¿½2000, which is a large sum of money, especially for students.In the data, some of the driver’s are missing their “number of mistakes”. I will use my formula to fill in some of this missing data:DriverGenderLessonsMistakes63M32-0.2×32+22.7=1674F24-0.2×24+22.7=17118F39-0.2×39+22.7=14216M18-0.2×18+23.7=19The results show that with my formula drivers 63, 74 and 216 failed their test. Only driver 118 passed.Hypothesis 1 (“The more lessons taken by a pupil the fewer mistakes they make in the test’) is correct. This means that my prediction also correct.I feel that gender would be a big factor. I think that females benefit more from lessons than males, because females in general, are more nervous and lessons would boost their confidence more than males. For hypothesise 2, I am going to investigate gender to see if females need more lessons than males.Hypothesis 2Females will benefit more from lessons than males.I think that females will benefit lessons more because females are more nervous and lessons would help them to become more confident. I will draw two graphs, one for males and the other for females, to work out what gender benefits lessons the most. I am expecting that there will be a negative correlation in the scatter graphs and that the correlation will be stronger in the females’ trend line compared to the male’s trend line on the scatter graph.I will sort my sample data into male and female drivers. I will produce a scatter graph for each gender so that I can compare my results easily.MalesHere is a scatter graph showing the number of mistakes against the number of lessons for males who take their driving test. There is a steep negative correlation, which indicates that the number of lessons benefits males by a large amount.The equation of the line is y=-0.5013x + 27.099. This gradient means that for every 2 lessons taken the driver will make 1 less mistake. (Reduces mistakes by 0.5013 every lesson)The intercept means that with no lessons, an average male makes 27 mistakes.For a driver who makes no mistakes in their test, the number of mistakes would need to be reduced by twelve.P = 12/ (1/2).P = 24This means that it would take the average male driver 24 lessons to pass their test. This seems reasonable, lessons are beneficial for males.I will now use my formula to fill in some of the missing figures that exist in the records:DriverGenderLessonsMistakes (1.d.p)63M32-0.5x 32+ 27=11216M18-0.5×18 + 27=18FemalesAbove is a scatter graph showing the number of mistakes against the number of lessons for females who take their driving test. There is a very slight negative correlation that indicates that the number of lessons does not benefit females much.The equation of the line is y=-0.0337x+20.987. This gradient means that for every 100 lessons taken the driver will make 3 less mistakes (reduce mistakes by 0.0337 each lesson). The intercept means that with no lessons on average 21 mistakes.To pass the test the number of mistakes would need to be reduced by six.P = 6/ (3/100).P = 200This means that it would take the average female driver 200 lessons to pass their test. This is a slightly strange result. Lessons seem to have hardly any impact for the average female.These results would suggest lessons are not worth paying for!I will now use my formula to fill in some of the missing figures that exist in the records:DriverGenderLessonsMistakes74F24-0.03×24+20=19118F39-0.03×39+20=19The results from hypothesis 2 tell me that the average woman needs more lessons to pass her driving test, than the average male. This means that both the hypothesis and my prediction are incorrect.Males may have preformed better because in general, males are more interested in mechanical things and are more confident than females.Hypothesis 3Depending on what instructor a pupil gets can have an impact on whether a pupil passes their driving test. The instructor’s approach to the pupil can have a large result on a pupil’s performance. I predict that for the majority of the instructors, males will perform better than females. This is because males in general, can be more outgoing and have more confidence than females.I will sort my sample of data into instructor and gender. I will test this hypothesis by presenting the information in histograms. By doing this I can examine my results easily. I will ignore students with missing data, as this will make my results inaccurateI will group the information into 4 different classes because of the wide range of values. The groups are shown below:Number of mistakesWhat each group means0-10″Passed comfortably”10-15″Just passed”15-20″Just failed”20-40″Failed badly”Histograms:The two histograms look similar in shape. There are more males students that “passed comfortably” compared to females. There are a greater number of females that “just passed” compared to males. In total, 23 females passed and only 21 males passed. For both genders, 8 people failed. The total amount of people that “instructor A” taught is 60. he took 31 females and 29 males.This time the two histograms look very different in shape. There is more than double the amount of males “passed comfortably” compared to females. More females “just passed” compared to males. There is a greater amount of females that “just failed” in contrast to males. Double the amount of females “failed badly against males. In total, 14 females passed and 35 failed. 24 males passed and 20 failed. Instructor B tutored 93 pupils.The two histograms look quite similar in appearance. More females “failed badly”, “just failed” and “just passed” in contrast to males. This could be because instructor C taught 20 females and only 16 males. One more male “passed comfortably” compared to females. In total, 8 females passed and 12 failed. 8 males also passed but only 8 failed. Instructor C coached 36 pupils.The two histograms look unlike each other, though they have comparable results. It seems that no females passed and only 4 males passed their test. 19 females failed and 15 males failed. More females “just failed” than males. Fewer males “failed badly” compared to females. The total number of male and female pupils which instructor D has is 38.Using the frequencies from my histograms, I will work the probabilities out by using relative frequency. I can estimate the probability of passing or the probability of failing for each gender, with a certain instructor. I could also predict the average pass rate. With this method, my results will be far more accurate compared to just looking at the histograms.To find the frequencies from the histograms, I will use the formula:Frequency = Area…soFrequency = width x frequency densityI will change result into percentage, so it is easy to compare results.I will work out the frequency for the 4 classes and record my information on a table.Instructor0-10 mistakes10-15 mistakes15-20 mistakes20-40 mistakesAverage % passAM=13/29×100= 44.8%F=10/31×100= 32.3%M=8/29×100=27.6%F=13/31×100=41.9%M=5/29×100=17.2%F= 6/31×100=19.4%M=2/31×100=6.5%F=3/29×100=10.3M=72.4%F=74.2%BM=19/44×100=43.2%F=8/49×100=16.3%M=5/44×100=11.4%F=6/49×100=12.2%M=8/44×100=18.1%F=11/49×100=22.4%M=12/44×100=27.3%F=24/49×100=48.9%M=54.6%F=28.5%CM=6/16×100=37.5%F=5/20×100=25%M=2/16×100=12.5%F=3/20×100=15%M=1/16×100=6.25%F=3/20×100=15%M=7/16×100=43.75%F=9/20×100=45%M=50%F=40%DM=2/19×100=10.5%F=0/19×100=0%M=2/19×100=10.5%F=0/19×100=0%M=1/19×100=5.3%F=3/19×100=15.8%M=14/19×100=73.4%F=16/19×100=84.2%M=21%F=0%Males have preformed better than females for instructors B, C and D. Only for instructor A did females perform better males. Could this be because Instructor A is female? I.e. female pupils listen to female instructor more than males. Males only did worse than females by 2.2% for instructor A. For instructor B, males achieved 26.1% better than females. For instructor C, the difference between males and females was 10%. Males performed better than females by 21%.I will now rank the instructors 1st – 4th for both genders.RatingInstructor (Males)1stA2ndB3rdC4thDRatingInstructor (Females)1stA2ndC3rdB4thDInstructor A seems to influence both males and females the most, because s/he has the highest pass rate. This means that instructor A would be the best instructor to have if you were learning to drive. If I had to recommend a instructor, I would recommend him or her.Instructor D doesn’t seem to help females at all,s/ he also influences males the least. This is because s/he has the lowest pass rate for both genders. Instructor D would be the worst instructor to have if you were learning to drive. To be quite honest it seems that s/he provides no help at all making him/her a waste of money. I would not recommend this instructor.I was correct in saying that instructor would have an effect on how a pupil performs in their driving test. My prediction, that for the majority of the instructors, males will perform better than females is correct. Males performed better than females for 3 out of 4 instructors. I thought this because males in general, can be more out going and are more confident than females.ConclusionFirstly I made a sample that was stratified by gender and instructor. I did not think that choosing random data for a sample would be so accurate.Hypothesis 1:I decided to investigate in the hypothesis-“the more lessons taken by a pupil the fewer mistakes they make in the test.’ I predicted that this was accurate because generally, the more you practice at something, the better you become at it (Hence the saying, practice makes perfect). This turned out to be correct. I was surprised that there was not a strong negative correlation; hence it took 24 lessons in order for the average pupil to pass their test.When working with number of mistakes and number of lessons, I came up with the idea that gender might have an effect, therefore I brought forward hypothesis 2.Hypothesis 2:Females will benefit from lessons more than males was my 2nd hypothesis. I thought that females would benefit more from lessons because females are more nervous and lessons would help them to become more confident. This turned out to be incorrect. Males benefited more than females. I was shocked to find that it would take 200 lessons for the average female to pass the test.Given that lessons obviously had an impact on number of mistakes, I considered if the quality of the lessons made a difference, hence the quality of the instructor. This is where I established hypothesis 3.Hypothesis 3:Depending on what instructor a pupil gets, can have an impact on whether a pupil passes their driving test.I finished in stating that my hypothesis 3 was correct and that I was correct in saying that for the majority of the instructors, males will perform better than females. Males performed better than females for 3 out of 4 instructors. I considered this because males in general, can be more out going and are more confident than females. Instructor A turned out to be the best instructor for both genders (72% of male pupils passed and 74% of females passed). However driving instructor A was unusual because females had a better pass rate than males.If I were to do this investigation again, I would have a larger amount of data to make this investigation more reliable. I would also add more specific details, such as “the gender of instructor” , “month driving test was taken”, “cost of lessons”, “how long lesson lasts” and “how long did driver practice for test” These factors would help me investigate in more depth.