Treat Date as Continuous Variable R
Working with dates
           
        
Working with dates in R requires more attention than working with other object classes. Below, we offer some tools and example to make this process less painful. Luckily, dates can be wrangled easily with practice, and with a set of helpful packages such as lubridate.
Upon import of raw data, R often interprets dates as character objects - this means they cannot be used for general date operations such as making time series and calculating time intervals. To make matters more difficult, there are many ways a date can be formatted and you must help R know which part of a date represents what (month, day, hour, etc.).
Dates in R are their own class of object - the          Date          class. It should be noted that there is also a class that stores objects with date          and          time. Date time objects are formally referred to as          POSIXt,          POSIXct, and/or          POSIXlt          classes (the difference isn't important). These objects are informally referred to as          datetime          classes.
- It is important to make R recognize when a column contains dates.
 
- Dates are an object class and can be tricky to work with.
 
- Here we present several ways to convert date columns to Date class.
Preparation
Load packages
This code chunk shows the loading of packages required for this page. In this handbook we emphasize              p_load()              from              pacman, which installs the package if necessary              and              loads it for use. You can also load installed packages with              library()              from              base              R. See the page on R basics for more information on R packages.
                                  # Checks if package is installed, installs if necessary, and loads package for current session                  pacman                  ::                  p_load                  (                  lubridate,                  # general package for handling and converting dates                                    linelist,                  # has function to "guess" messy dates                  aweek,                  # another option for converting dates to weeks, and weeks to dates                  zoo,                  # additional date/time functions                  tidyverse,                  # data management and visualization                                    rio                  )                  # data import/export                                          Import data
We import the dataset of cases from a simulated Ebola epidemic. If you want to download the data to follow along step-by-step, see instruction in the Download handbook and data page. We assume the file is in the working directory so no sub-folders are specified in this file path.
                                  linelist                  <-                  import                  (                  "linelist_cleaned.xlsx"                  )                                          Current date
You can get the current "system" date or system datetime of your computer by doing the following with base R.
                              # get the system date - this is a DATE class                Sys.Date                (                )                                                ## [1] "2021-12-15"                                                  # get the system time - this is a DATETIME class                Sys.time                (                )                                                ## [1] "2021-12-15 20:25:52 EST"                    With the            lubridate            package these can also be returned with            today()            and            now(), respectively.            date()            returns the current date and time with weekday and month names.
Convert to Date
After importing a dataset into R, date column values may look like "1989/12/30", "05/06/2014", or "13 Jan 2020". In these cases, R is likely still treating these values as Character values. R must be told that these values are dates… and what the format of the date is (which part is Day, which is Month, which is Year, etc).
Once told, R converts these values to class Date. In the background, R will store the dates as numbers (the number of days from its "origin" date 1 Jan 1970). You will not interface with the date number often, but this allows for R to treat dates as continuous variables and to allow special operations such as calculating the distance between dates.
By default, values of class Date in R are displayed as YYYY-MM-DD. Later in this section we will discuss how to change the display of date values.
Below we present two approaches to converting a column from character values to class Date.
                                          TIP:                            You can check the current class of a column with              base              R function              class(), like              class(linelist$date_onset).          
base R
              as.Date()              is the standard,              base              R function to convert an object or column to class Date (note capitalization of "D").
Use of              as.Date()              requires that:
- You                specify the                  existing                  format of the raw character date                or the origin date if supplying dates as numbers (see section on Excel dates)
 
- If used on a character column, all date values must have the same exact format (if this is not the case, try                guess_dates()from the linelist package)
              First, check the class of your column with              class()              from              base              R. If you are unsure or confused about the class of your data (e.g. you see "POSIXct", etc.) it can be easiest to first convert the column to class Character with              as.character(), and then convert it to class Date.
              Second, within the              as.Date()              function, use the              format =              argument to tell R the              current              format of the character date components - which characters refer to the month, the day, and the year, and how they are separated. If your values are already in one of R's standard date formats ("YYYY-MM-DD" or "YYYY/MM/DD") the              format =              argument is not necessary.
To              format =, provide a character string (in quotes) that represents the              current              date format using the special "strptime" abbreviations below. For example, if your character dates are currently in the format "DD/MM/YYYY", like "24/04/1968", then you would use              format = "%d/%m/%Y"              to convert the values into dates.              Putting the format in quotation marks is necessary. And don't forget any slashes or dashes!            
                                  # Convert to class date                  linelist                  <-                  linelist                  %>%                  mutate                  (date_onset                  =                  as.Date                  (                  date_of_onset, format                  =                  "%d/%m/%Y"                  )                  )                                          Most of the strptime abbreviations are listed below. You can see the complete list by running              ?strptime.
%d = Day number of month (5, 17, 28, etc.)
              %j = Day number of the year (Julian day 001-366)
              %a = Abbreviated weekday (Mon, Tue, Wed, etc.)
              %A = Full weekday (Monday, Tuesday, etc.) %w = Weekday number (0-6, Sunday is 0)
              %u = Weekday number (1-7, Monday is 1)
              %W = Week number (00-53, Monday is week start)
              %U = Week number (01-53, Sunday is week start)
              %m = Month number (e.g. 01, 02, 03, 04)
              %b = Abbreviated month (Jan, Feb, etc.)
              %B = Full month (January, February, etc.)
              %y = 2-digit year (e.g. 89)
              %Y = 4-digit year (e.g. 1989)
              %h = hours (24-hr clock)
              %m = minutes
              %s = seconds %z = offset from GMT
              %Z = Time zone (character)
                                                TIP:                                The                format =                argument of                as.Date()                is                not                telling R the format you want the dates to be, but rather how to identify the date parts as they are                before                you run the command.            
                                                TIP:                                Be sure that in the                format =                argument you use the                date-part separator                (e.g. /, -, or space) that is present in your dates.            
Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.
lubridate
Converting character objects to dates can be made easier by using the lubridate package. This is a tidyverse package designed to make working with dates and times more simple and consistent than in base R. For these reasons, lubridate is often considered the gold-standard package for dates and time, and is recommended whenever working with them.
The              lubridate              package provides several different helper functions designed to convert character objects to dates in an intuitive, and more lenient way than specifying the format in              as.Date(). These functions are specific to the rough date format, but allow for a variety of separators, and synonyms for dates (e.g. 01 vs Jan vs January) - they are named after abbreviations of date formats.
                                  # install/load lubridate                                    pacman                  ::                  p_load                  (                  lubridate                  )                                          The              ymd()              function flexibly converts date values supplied as              year, then month, then day.
                                  # read date in year-month-day format                  ymd                  (                  "2020-10-11"                  )                                                        ## [1] "2020-10-11"                                                  ## [1] "2020-10-11"                        The              mdy()              function flexibly converts date values supplied as              month, then day, then year.
                                  # read date in month-day-year format                  mdy                  (                  "10/11/2020"                  )                                                        ## [1] "2020-10-11"                                                  ## [1] "2020-10-11"                        The              dmy()              function flexibly converts date values supplied as              day, then month, then year.
                                  # read date in day-month-year format                  dmy                  (                  "11 10 2020"                  )                                                        ## [1] "2020-10-11"                                                  ## [1] "2020-10-11"                                                                                       If using piping, the conversion of a character column to dates with lubridate might look like this:
                                  linelist                  <-                  linelist                  %>%                  mutate                  (date_onset                  =                  lubridate                  ::                  dmy                  (                  date_onset                  )                  )                                          Once complete, you can run              class()              to verify the class of the column
                                  # Check the class of the column                  class                  (                  linelist                  $                  date_onset                  )                                          Once the values are in class Date, R will by default display them in the standard format, which is YYYY-MM-DD.
Note that the above functions work best with 4-digit years. 2-digit years can produce unexpected results, as lubridate attempts to guess the century.
To convert a 2-digit year into a 4-digit year (all in the same century) you can convert to class character and then combine the existing digits with a pre-fix using              str_glue()              from the              stringr              package (see Characters and strings). Then convert to date.
                                  two_digit_years                  <-                  c                  (                  "15",                  "15",                  "16",                  "17"                  )                  str_glue                  (                  "20{two_digit_years}"                  )                                                        ## 2015 ## 2015 ## 2016 ## 2017                      Combine columns
You can use the              lubridate              functions              make_date()              and              make_datetime()              to combine multiple numeric columns into one date column. For example if you have numeric columns              onset_day,              onset_month, and              onset_year              in the data frame              linelist:
                                  linelist                  <-                  linelist                  %>%                  mutate                  (onset_date                  =                  make_date                  (year                  =                  onset_year, month                  =                  onset_month, day                  =                  onset_day                  )                  )                                          Excel dates
In the background, most software store dates as numbers. R stores dates from an origin of 1st January, 1970. Thus, if you run            as.numeric(as.Date("1970-01-01))            you will get            0.
Microsoft Excel stores dates with an origin of either December 30, 1899 (Windows) or January 1, 1904 (Mac), depending on your operating system. See this Microsoft guidance for more information.
Excel dates often import into R as these numeric values instead of as characters. If the dataset you imported from Excel shows dates as numbers or characters like "41369"… use            as.Date()            (or            lubridate's            as_date()            function) to convert, but            instead of supplying a "format" as above, supply the Excel origin date            to the argument            origin =.
This will not work if the Excel date is stored in R as a character type, so be sure to ensure the number is class Numeric!
NOTE: You should provide the origin date in R's default date format ("YYYY-MM-DD").
                              # An example of providing the Excel 'origin date' when converting Excel number dates                data_cleaned                <-                data                %>%                mutate                (date_onset                =                as.numeric                (                date_onset                )                )                %>%                # ensure class is numeric                mutate                (date_onset                =                as.Date                (                date_onset, origin                =                "1899-12-30"                )                )                # convert to date using Excel origin                                    Messy dates
The function            guess_dates()            from the            linelist            package attempts to read a "messy" date column containing dates in many different formats and convert the dates to a standard format. You can read more online about            guess_dates(). If            guess_dates()            is not yet available on CRAN for R 4.0.2, try install via            pacman::p_load_gh("reconhub/linelist").
For example            guess_dates            would see a vector of the following character dates "03 Jan 2018", "07/03/1982", and "08/20/85" and convert them to class Date as:            2018-01-03,            1982-03-07, and            1985-08-20.
                              linelist                ::                guess_dates                (                c                (                "03 Jan 2018",                "07/03/1982",                "08/20/85"                )                )                                                ## [1] "2018-01-03" "1982-03-07" "1985-08-20"                    Some optional arguments for            guess_dates()            that you might include are:
-               error_tolerance- The proportion of entries which cannot be identified as dates to be tolerated (defaults to 0.1 or 10%)
-               last_date- the last valid date (defaults to current date)
 
-               first_date- the first valid date. Defaults to fifty years before the last_date.
                                                # An example using guess_dates on the column dater_onset                                linelist                  <-                  linelist                  %>%                  # the dataset is called linelist                                                  mutate(                                  date_onset =                  linelist::                  guess_dates(                  # the guess_dates() from package "linelist"                                                  date_onset,                                  error_tolerance =                  0.1,                                  first_date =                  "2016-01-01"                                                  )                                    Working with date-time class
As previously mentioned, R also supports a            datetime            class - a column that contains date            and            time information. As with the            Date            class, these often need to be converted from            character            objects to            datetime            objects.
Convert dates with times
A standard              datetime              object is formatted with the date first, which is followed by a time component - for example              01 Jan 2020, 16:30. As with dates, there are many ways this can be formatted, and there are numerous levels of precision (hours, minutes, seconds) that can be supplied.
Luckily,              lubridate              helper functions also exist to help convert these strings to              datetime              objects. These functions are extensions of the date helper functions, with              _h              (only hours supplied),              _hm              (hours and minutes supplied), or              _hms              (hours, minutes, and seconds supplied) appended to the end (e.g.dmy_hms()). These can be used as shown:
Convert datetime with only hours to datetime object
                                  ymd_h                  (                  "2020-01-01 16hrs"                  )                                                        ## [1] "2020-01-01 16:00:00 UTC"                                                  ## [1] "2020-01-01 16:00:00 UTC"                        Convert datetime with hours and minutes to datetime object
                                  dmy_hm                  (                  "01 January 2020 16:20"                  )                                                        ## [1] "2020-01-01 16:20:00 UTC"                        Convert datetime with hours, minutes, and seconds to datetime object
                                  mdy_hms                  (                  "01 January 2020, 16:20:40"                  )                                                        ## [1] "2020-01-20 16:20:40 UTC"                        You can supply time zone but it is ignored. See section later in this page on time zones.
                                  mdy_hms                  (                  "01 January 2020, 16:20:40 PST"                  )                                                        ## [1] "2020-01-20 16:20:40 UTC"                        When working with a data frame, time and date columns can be combined to create a datetime column using              str_glue()              from              stringr              package and an appropriate              lubridate              function. See the page on Characters and strings for details on              stringr.
In this example, the              linelist              data frame has a column in format "hours:minutes". To convert this to a datetime we follow a few steps:
- Create a "clean" time of admission column with missing values filled-in with the column median. We do this because                lubridate                won't operate on missing values. Combine it with the column                date_hospitalisation, and then use the functionymd_hm()to convert.
                                                      # packages                                    pacman::                    p_load(tidyverse, lubridate, stringr)                                                        # time_admission is a column in hours:minutes                                    linelist                    <-                    linelist                    %>%                                                                                            # when time of admission is not given, assign the median admission time                                                        mutate(                                      time_admission_clean =                    ifelse(                                      is.na(time_admission),                    # if time is missing                                                        median(time_admission),                    # assign the median                                                        time_admission                    # if not missing keep as is                                                        )                    %>%                                                                                            # use str_glue() to combine date and time columns to create one character column                                                        # and then use ymd_hm() to convert it to datetime                                                        mutate(                                      date_time_of_admission =                    str_glue("{date_hospitalisation} {time_admission_clean}")                    %>%                                                        ymd_hm()                                      )                                          Convert times alone
If your data contain only a character time (hours and minutes), you can convert and manipulate them as times using              strptime()              from              base              R. For example, to get the difference between two of these times:
                                  # raw character times                  time1                  <-                  "13:45"                  time2                  <-                  "15:20"                  # Times converted to a datetime class                  time1_clean                  <-                  strptime                  (                  time1, format                  =                  "%H:%M"                  )                  time2_clean                  <-                  strptime                  (                  time2, format                  =                  "%H:%M"                  )                  # Difference is of class "difftime" by default, here converted to numeric hours                                    as.numeric                  (                  time2_clean                  -                  time1_clean                  )                  # difference in hours                                                        ## [1] 1.583333                        Note however that without a date value provided, it assumes the date is today. To combine a string date and a string time together see how to use              stringr              in the section just above. Read more about              strptime()              here.
To convert single-digit numbers to double-digits (e.g. to "pad" hours or minutes with leading zeros to achieve 2 digits), see this "Pad length" section of the Characters and strings page.
Working with dates
            lubridate            can also be used for a variety of other functions, such as            extracting aspects of a date/datetime,            performing date arithmetic, or            calculating date intervals          
Here we define a date to use for the examples:
                              # create object of class Date                example_date                <-                ymd                (                "2020-03-01"                )                                    Date math
You can add certain numbers of days or weeks using their respective function from lubridate.
                                  # add 3 days to this date                  example_date                  +                  days                  (                  3                  )                                                        ## [1] "2020-03-04"                                                          # add 7 weeks and subtract two days from this date                  example_date                  +                  weeks                  (                  7                  )                  -                  days                  (                  2                  )                                                        ## [1] "2020-04-17"                      Date intervals
The difference between dates can be calculated by:
- Ensure both dates are of class date
 
- Use subtraction to return the "difftime" difference between the two dates
 
- If necessary, convert the result to numeric class to perform subsequent mathematical calculations
Below the interval between two dates is calculated and displayed. You can find intervals by using the subtraction "minus" symbol on values that are class Date. Note, however that the class of the returned value is "difftime" as displayed below, and must be converted to numeric.
                                  # find the interval between this date and Feb 20 2020                                    output                  <-                  example_date                  -                  ymd                  (                  "2020-02-20"                  )                  output                  # print                                                        ## Time difference of 10 days                                                  ## [1] "difftime"                        To do subsequent operations on a "difftime", convert it to numeric with              as.numeric().
This can all be brought together to work with data - for example:
                                  pacman                  ::                  p_load                  (                  lubridate,                  tidyverse                  )                  # load packages                  linelist                  <-                  linelist                  %>%                  # convert date of onset from character to date objects by specifying dmy format                  mutate                  (date_onset                  =                  dmy                  (                  date_onset                  ),          date_hospitalisation                  =                  dmy                  (                  date_hospitalisation                  )                  )                  %>%                  # filter out all cases without onset in march                  filter                  (                  month                  (                  date_onset                  )                  ==                  3                  )                  %>%                  # find the difference in days between onset and hospitalisation                  mutate                  (days_onset_to_hosp                  =                  date_hospitalisation                  -                  date_of_onset                  )                                          In a data frame context, if either of the above dates is missing, the operation will fail for that row. This will result in an              NA              instead of a numeric value. When using this column for calculations, be sure to set the              na.rm =              argument to              TRUE. For example:
                                  # calculate the median number of days to hospitalisation for all cases where data are available                  median                  (                  linelist_delay                  $                  days_onset_to_hosp, na.rm                  =                  T                  )                                          Date display
Once dates are the correct class, you often want them to display differently, for example to display as "Monday 05 January" instead of "2018-01-05". You may also want to adjust the display in order to then group rows by the date elements displayed - for example to group by month-year.
              format()                          
            Adjust date display with the              base              R function              format(). This function accepts a character string (in quotes) specifying the              desired              output format in the "%" strptime abbreviations (the same syntax as used in              as.Date()). Below are most of the common abbreviations.
Note: using              format()              will convert the values to class Character, so this is generally used towards the end of an analysis or for display purposes only! You can see the complete list by running              ?strptime.
%d = Day number of month (5, 17, 28, etc.)
              %j = Day number of the year (Julian day 001-366)
              %a = Abbreviated weekday (Mon, Tue, Wed, etc.)
              %A = Full weekday (Monday, Tuesday, etc.)
              %w = Weekday number (0-6, Sunday is 0)
              %u = Weekday number (1-7, Monday is 1)
              %W = Week number (00-53, Monday is week start)
              %U = Week number (01-53, Sunday is week start)
              %m = Month number (e.g. 01, 02, 03, 04)
              %b = Abbreviated month (Jan, Feb, etc.)
              %B = Full month (January, February, etc.)
              %y = 2-digit year (e.g. 89)
              %Y = 4-digit year (e.g. 1989)
              %h = hours (24-hr clock)
              %m = minutes
              %s = seconds
              %z = offset from GMT
              %Z = Time zone (character)
An example of formatting today's date:
                                  # today's date, with formatting                  format                  (                  Sys.Date                  (                  ), format                  =                  "%d %B %Y"                  )                                                        ## [1] "15 December 2021"                                                          # easy way to get full date and time (default formatting)                  date                  (                  )                                                        ## [1] "Wed Dec 15 20:25:53 2021"                                                          # formatted combined date, time, and time zone using str_glue() function                  str_glue                  (                  "{format(Sys.Date(), format = '%A, %B %d %Y, %z  %Z, ')}{format(Sys.time(), format = '%H:%M:%S')}"                  )                                                        ## Wednesday, December 15 2021, +0000  UTC, 20:25:53                                                  ## [1] "2021 Week 50"                        Note that if using              str_glue(), be aware of that within the expected double quotes " you should only use single quotes (as above).
Month-Year
To convert a Date column to Month-year format, we suggest you use the function              as.yearmon()              from the              zoo              package. This converts the date to class "yearmon" and retains the proper ordering. In contrast, using              format(column, "%Y %B")              will convert to class Character and will order the values alphabetically (incorrectly).
Below, a new column              yearmonth              is created from the column              date_onset, using the              as.yearmon()              function. The default (correct) ordering of the resulting values are shown in the table.
                                  # create new column                                    test_zoo                  <-                  linelist                  %>%                  mutate                  (yearmonth                  =                  zoo                  ::                  as.yearmon                  (                  date_onset                  )                  )                  # print table                  table                  (                  test_zoo                  $                  yearmon                  )                                                        ##  ## Apr 2014 May 2014 Jun 2014 Jul 2014 Aug 2014 Sep 2014 Oct 2014 Nov 2014 Dec 2014 Jan 2015 Feb 2015 Mar 2015 Apr 2015  ##        7       64      100      226      528     1070     1112      763      562      431      306      277      186                        In contrast, you can see how only using              format()              does achieve the desired display format, but not the correct ordering.
                                  # create new column                  test_format                  <-                  linelist                  %>%                  mutate                  (yearmonth                  =                  format                  (                  date_onset,                  "%b %Y"                  )                  )                  # print table                  table                  (                  test_format                  $                  yearmon                  )                                                        ##  ## Apr 2014 Apr 2015 Aug 2014 Dec 2014 Feb 2015 Jan 2015 Jul 2014 Jun 2014 Mar 2015 May 2014 Nov 2014 Oct 2014 Sep 2014  ##        7      186      528      562      306      431      226      100      277       64      763     1112     1070                        Note: if you are working within a              ggplot()              and want to adjust how dates are              displayed              only, it may be sufficient to provide a strptime format to the              date_labels =              argument in              scale_x_date()              - you can use              "%b %Y"              or              "%Y %b". See the ggplot tips page.
              zoo              also offers the function              as.yearqtr(), and you can use              scale_x_yearmon()              when using              ggplot().
Epidemiological weeks
lubridate
See the page on Grouping data for more extensive examples of grouping data by date. Below we briefly describe grouping data by weeks.
We generally recommend using the              floor_date()              function from              lubridate, with the argument              unit = "week". This rounds the date down to the "start" of the week, as defined by the argument              week_start =. The default week start is 1 (for Mondays) but you can specify any day of the week as the start (e.g. 7 for Sundays).              floor_date()              is versitile and can be used to round down to other time units by setting              unit =              to "second", "minute", "hour", "day", "month", or "year".
The returned value is the start date of the week, in Date class. Date class is useful when plotting the data, as it will be easily recognized and ordered correctly by              ggplot().
If you are only interested in adjusting dates to display by week in a plot, see the section in this page on Date display. For example when plotting an epicurve you can format the date display by providing the desired strptime "%" nomenclature. For example, use "%Y-%W" or "%Y-%U" to return the year and week number (given Monday or Sunday week start, respectively).
Weekly counts
See the page on Grouping data for a thorough explanation of grouping data with              count(),              group_by(), and              summarise(). A brief example is below.
- Create a new 'week' column with                mutate(), usingfloor_date()withunit = "week"
 
- Get counts of rows (cases) per week with                count(); filter out any cases with missing date
 
- Finish with                complete()from tidyr to ensure that all weeks appear in the data - even those with no rows/cases. By default the count values for any "new" rows are NA, but you can make them 0 with thefill =argument, which expects a named list (below,nis the name of the counts column).
                                  # Make aggregated dataset of weekly case counts                  weekly_counts                  <-                  linelist                  %>%                  drop_na                  (                  date_onset                  )                  %>%                  # remove cases missing onset date                  mutate                  (weekly_cases                  =                  floor_date                  (                  # make new column, week of onset                  date_onset,     unit                  =                  "week"                  )                  )                  %>%                  count                  (                  weekly_cases                  )                  %>%                  # group data by week and count rows per group (creates column 'n')                  tidyr                  ::                  complete                  (                  # ensure all weeks are present, even those with no cases reported                  weekly_cases                  =                  seq.Date                  (                  # re-define the "weekly_cases" column as a complete sequence,                  from                  =                  min                  (                  weekly_cases                  ),                  # from the minimum date                  to                  =                  max                  (                  weekly_cases                  ),                  # to the maxiumum date                  by                  =                  "week"                  ),                  # by weeks                  fill                  =                  list                  (n                  =                  0                  )                  )                  # fill-in NAs in the n counts column with 0                                          Here are the first rows of the resulting data frame:
Epiweek alternatives
Note that              lubridate              also has functions              week(),              epiweek(), and              isoweek(), each of which has slightly different start dates and other nuances. Generally speaking though,              floor_date()              should be all that you need. Read the details for these functions by entering              ?week              into the console or reading the documentation here.
You might consider using the package              aweek              to set epidemiological weeks. You can read more about it on the RECON website. It has the functions              date2week()              and              week2date()              in which you can set the week start day with              week_start = "Monday". This package is easiest if you want "week"-style outputs (e.g. "2020-W12"). Another advantage of              aweek              is that when              date2week()              is applied to a date column, the returned column (week format) is automatically of class Factor and includes levels for all weeks in the time span (this avoids the extra step of              complete()              described above). However,              aweek              does not have the functionality to round dates to other time units such as months, years, etc.
Another alternative for time series which also works well to show a a "week" format ("2020 W12") is              yearweek()              from the package              tsibble, as demonstrated in the page on Time series and outbreak detection.
Converting dates/time zones
When data is present in different time time zones, it can often be important to standardise this data in a unified time zone. This can present a further challenge, as the time zone component of data must be coded manually in most cases.
In R, each datetime object has a timezone component. By default, all datetime objects will carry the local time zone for the computer being used - this is generally specific to a location rather than a named timezone, as time zones will often change in locations due to daylight savings time. It is not possible to accurately compensate for time zones without a time component of a date, as the event a date column represents cannot be attributed to a specific time, and therefore time shifts measured in hours cannot be reasonably accounted for.
To deal with time zones, there are a number of helper functions in lubridate that can be used to change the time zone of a datetime object from the local time zone to a different time zone. Time zones are set by attributing a valid tz database time zone to the datetime object. A list of these can be found here - if the location you are using data from is not on this list, nearby large cities in the time zone are available and serve the same purpose.
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
                              # assign the current time to a column                time_now                <-                Sys.time                (                )                time_now                                                ## [1] "2021-12-15 20:25:53 EST"                                                  # use with_tz() to assign a new timezone to the column, while CHANGING the clock time                time_london_real                <-                with_tz                (                time_now,                "Europe/London"                )                # use force_tz() to assign a new timezone to the column, while KEEPING the clock time                time_london_local                <-                force_tz                (                time_now,                "Europe/London"                )                # note that as long as the computer that was used to run this code is NOT set to London time,                # there will be a difference in the times                                # (the number of hours difference from the computers time zone to london)                time_london_real                -                time_london_local                                                ## Time difference of 5 hours                    This may seem largely abstract, and is often not needed if the user isn't working across time zones.
Lagging and leading calculations
            lead()            and            lag()            are functions from the            dplyr            package which help find previous (lagged) or subsequent (leading) values in a vector - typically a numeric or date vector. This is useful when doing calculations of change/difference between time units.
Let's say you want to calculate the difference in cases between a current week and the previous one. The data are initially provided in weekly counts as shown below.
            When using              lag()              or              lead()              the order of rows in the dataframe is very important! - pay attention to whether your dates/numbers are ascending or descending          
First, create a new column containing the value of the previous (lagged) week.
- Control the number of units back/forward with              n =(must be a non-negative integer)
 
- Use              default =to define the value placed in non-existing rows (e.g. the first row for which there is no lagged value). By default this isNA.
 
- Use              order_by = TRUEif your the rows are not ordered by your reference column
                              counts                <-                counts                %>%                mutate                (cases_prev_wk                =                lag                (                cases_wk, n                =                1                )                )                                    Next, create a new column which is the difference between the two cases columns:
                              counts                <-                counts                %>%                mutate                (cases_prev_wk                =                lag                (                cases_wk, n                =                1                ),          case_diff                =                cases_wk                -                cases_prev_wk                )                                    You can read more about            lead()            and            lag()            in the documentation here or by entering            ?lag            in your console.
Source: https://epirhandbook.com/en/working-with-dates.html
0 Response to "Treat Date as Continuous Variable R"
Post a Comment