Was doing a little presentation to our research group and had to explain the difficulties of ‘collapsing’ longitudinal data into a single measure when the Y var is quite variable. For the particular Y var of interest, it represents burden of disease, so a high Y var for a long time is indicative of high risk, compared to a low value for a similar time. Hence you have issues using with the mean, or the AUC. There’s a lot more to it than that, but that’s the gist of the point of this graph. Sharing the code cause it might be useful to someone else at some point.
Disclaimer, maybe the title should be ‘lame example’.
Nothing overly exciting here. Just posting cause it took a little faffing about and someone else might like the idea. At my work (research institute) we (the social club committee) were organising an ‘Olympics event’ with a bunch of task for teams of 4, with a loose research theme. I thought a graphing race could work, and was trying think how it could be made a little more fun than drawing a line.
Petrol prices adjusted for inflation (Perth, Western Australia)
The thought for this sprung to mind when I saw petrol drop below $1.20 per litre the other day, and it made me think, I remember paying that when I got to Australia 4.5 years ago. Fuel prices are listed online here, so I went to the site for a nosey at their archiving and they have a reasonably comprehensive archive of data – WA historic fuel prices. All that was left to do was hunt out the inflation (CPI) data, which was also readily available – Australian inflation data.
The fuel price data was available as a monthly average [for the Perth Metro area]. The inflation data was available quaterly. Given the fuel data started in 2001, I thought it made sense to show things in 2001 dollars, so I divided the indices by the 2001 indices to make this the baseline. For manipulation purposes I converted the month data into date data, using the 1st of the month for the day in all cases. Then the ‘closest’ inflation index was merged onto the fuel price, which was then converted into 2001 dollars, and graphed.
UPDATE: Based on the comment from ‘linuxizer’, I’ve updated this to stay inline with the S3 classes, something I didn’t have my head around at the time, still don’t know it inside out.
One thing I do often in my analysis is use things like ‘summary(mod)$coefficients’ and variance derivatives of that to pull the results from models into table so I can write my results nicely to csv, which can then be circulated to colleagues. I take the time to do this, so that when I go back and subset the data someone, I can instantly generate the same, nice, tailored results table. As it is often the case that you will run analysis multiple times, perhaps subsetting on years, or ages or subjects etc etc.
This is a follow on from the previous post, with updated code.
There was an argument ‘groups’ in the ggplot(…) line of the code that was working but is now no longer working with the updated version of R/ggplot2 (I don’t know the full ins and outs of this sorry). So code is fixed and available below.
Credit for the bulk of this code is to Abhijit Dasgupta and the commenters on the original post here from earlier this year. I have made a few changes to the functionality of this which I think warrant sharing.
A brief intro, this function will use the output from a survival analysis fitted in R with ‘survfit’ from the ‘survival’ library, to plot a survival curve with the option to include a table with the numbers of those ‘at risk’ below the plot.
UPDATE – July 2017
The site I was originally using to store the script and data has since closed, have migrated to GitHub, and updated the R code to remove deprecated ggplot code etc.
Apologies that the output isn’t showing well as far as spacing and alignment is concerned. Am working on fixing that!
Generally when we have a set of data, we have known groupings. Be that three different treatment groups, two sex groups, 4 ethnicity groups etc. There is also the possibility of unobserved groupings within your data, some examples (I’m clutching at straws here) are those who are vegetarians and those who aren’t, those who regularly exercise and those who don’t, or those who have a family history of a certain condition and those who don’t (assuming those data were not collected). There is an approach to look into this.