R for Data Science (2e)

Day 3. Data Transformation

By Chí Trung HÀ

Introduction

This chapter will introduce you to data transformation using the dplyr package and a new dataset on flights that departed from New York City in 2013.

Prerequisites

# install.packages("nycflights13")
library(nycflights13)
library(tidyverse)

nycflights13

This dataset contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in ?flights. For short:

 Data frame with columns year, month, day Date of departure.

 dep_time, arr_time Actual departure and arrival times (format HHMM or HMM), local tz.

 sched_dep_time, sched_arr_time Scheduled departure and arrival times (format HHMM or HMM), local tz.
flights
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
glimpse(flights)
Rows: 336,776
Columns: 19
$ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2~
$ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
$ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
$ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, ~
$ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, ~
$ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1~
$ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,~
$ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,~
$ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1~
$ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", "~
$ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4~
$ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N394~
$ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA",~
$ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD",~
$ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1~
$ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, ~
$ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6~
$ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0~
$ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0~

dplyr basics

  1. The first argument is always a data frame.
  2. The subsequent arguments typically describe which columns to operate on, using the variable names (without quotes).
  3. The output is always a new data frame.

|>: the pipe to combining multiple verbs (functions): the pipe takes the thing on its left and passes it along to the function on its right so that x |> f(y) is equivalent to f(x, y), and x |> f(y) |> g(z) is equivalent to g(f(x, y), z). The easiest way to pronounce the pipe is “then”.

dplyr’s verbs are organized into four groups based on what they operate on: rows, columns, groups, or tables. In the following sections you’ll learn the most important verbs for rows, columns, and groups, then we’ll come back to the join verbs that work on tables in Chapter 19.

# |>: the pipe to combining multiple verbs (functions)
flights |>
  filter(dest == "IAH") |> 
  group_by(year, month, day) |> 
  summarize(
    arr_delay = mean(arr_delay, na.rm = TRUE)
  )
`summarise()` has grouped output by 'year', 'month'. You can override using the
`.groups` argument.
A grouped_df: 365 x 4
yearmonthdayarr_delay
<int><int><int><dbl>
20131 1 17.850000
20131 2 7.000000
20131 3 18.315789
20131 4 -3.200000
20131 5 20.230769
20131 6 9.277778
20131 7 -7.736842
20131 8 7.789474
20131 9 18.055556
2013110 6.684211
2013111-22.526316
2013112-19.923077
2013113 14.833333
2013114 17.421053
2013115 26.105263
2013116 40.105263
2013117 -7.894737
2013118 -5.210526
2013119 2.153846
2013120 10.352941
2013121 19.789474
2013122 -5.055556
2013123 -6.947368
2013124-15.894737
2013125-10.368421
2013126 -8.384615
2013127-16.055556
2013128-15.157895
2013129-20.842105
2013130 20.842105
............
201312 2-6.5000000
201312 3-8.4545455
201312 4-3.1363636
201312 551.7142857
201312 614.8181818
201312 718.8666667
201312 829.4705882
201312 934.5454545
2013121060.3529412
20131211 3.4090909
20131212 5.3333333
20131213-3.0000000
2013121454.6666667
20131215 6.6470588
20131216-8.1818182
2013121733.4500000
20131218 8.4090909
20131219-6.6190476
2013122020.1000000
2013122150.1250000
2013122228.3333333
2013122328.8181818
20131224 2.3125000
20131225 7.5833333
20131226-0.4444444
20131227 6.1666667
20131228 9.5000000
2013122923.6111111
2013123023.6842105
20131231-4.9333333

Rows

Some important verbs that operate on rows:

  • filter(): allows you to keep rows based on the values of the columns without changing their order
  • arrange(): changes the order of the rows based on the value of the columns (without changing which are present)

Both functions only affect the rows, and the columns are left unchanged.

  • distinct() which finds rows with unique values but unlike arrange() and filter() it can also optionally modify the columns.

filter()

flights |> filter(dep_delay > 1000)
A tibble: 5 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
20131 9 641 9001301124215301272HA 51N384HAJFKHNL6404983 9 02013-01-09 09:00:00
2013110112116351126123918101109MQ3695N517MQEWRORD111 71916352013-01-10 16:00:00
2013615143219351137160721201127MQ3535N504MQJFKCMH 74 48319352013-06-15 19:00:00
2013722 8451600100510441815 989MQ3075N665MQJFKCVG 96 58916 02013-07-22 16:00:00
2013920113918451014145722101007AA 177N338AAJFKSFO354258618452013-09-20 18:00:00
# Flights that departed on January 1
flights |> filter(month==1 & day==1)
A tibble: 842 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
20131121402135 5 210 224-14B6 701N284JBJFKSJU189159821352013-01-01 21:00:00
20131121572155 2 43 41 2B6 43N537JBJFKMCO140 94421552013-01-01 21:00:00
20131121582200 -222542307-13EV4103N14998EWRBWI 36 16922 02013-01-01 22:00:00
20131122051720285 462040246AA1999N5DNAAEWRMIA146108517202013-01-01 17:00:00
20131122092145 24 58 37 21B6 35N608JBJFKPBI143102821452013-01-01 21:00:00
20131122092155 1424002337 23B61109N216JBJFKRDU 86 42721552013-01-01 21:00:00
20131122112145 2623392311 28B6 104N228JBJFKBUF 64 30121452013-01-01 21:00:00
20131122172229-12 249 315-26B6 713N547JBJFKSJU191159822292013-01-01 22:00:00
20131122172130 47 140 27 73B6 21N516JBJFKTPA163100521302013-01-01 21:00:00
2013112221200014123312124127EV4462N13566EWRBUF 56 28220 02013-01-01 20:00:00
20131122242200 2423242316 8EV4206N16561EWRPWM 47 28422 02013-01-01 22:00:00
20131122292159 30 149 100 49B6 11N531JBJFKFLL153106921592013-01-01 21:00:00
20131122402245 -523402356-16B6 608N279JBJFKPWM 44 27322452013-01-01 22:00:00
20131122502255 -523522359 -7B61018N521JBJFKBOS 37 18722552013-01-01 22:00:00
20131123022200 6223422253 49EV4276N13903EWRBDL 24 11622 02013-01-01 22:00:00
20131123062245 21 28 5 23B6 30N281JBJFKROC 59 26422452013-01-01 22:00:00
20131123072245 22 322357 35B6 128N178JBJFKBTV 59 26622452013-01-01 22:00:00
20131123102255 15 24 15 9B6 112N646JBJFKBUF 57 30122552013-01-01 22:00:00
20131123122000192 212110191EV4312N13958EWRDCA 44 19920 02013-01-01 20:00:00
20131123232200 83 222313 69EV4257N13538EWRBTV 44 26622 02013-01-01 22:00:00
20131123262130116 131 18 73B6 199N594JBJFKLAS290224821302013-01-01 21:00:00
20131123272250 37 322359 33B6 22N639JBJFKSYR 45 20922502013-01-01 22:00:00
20131123431724379 3141938456EV4321N21197EWRMCI222109217242013-01-01 17:00:00
20131123532359 -6 425 445-20B6 739N591JBJFKPSE195161723592013-01-01 23:00:00
20131123532359 -6 418 442-24B6 707N794JBJFKSJU185159823592013-01-01 23:00:00
20131123562359 -3 425 437-12B6 727N588JBJFKBQN186157623592013-01-01 23:00:00
201311 NA1630 NA NA1815 NAEV4308N18120EWRRDU NA 41616302013-01-01 16:00:00
201311 NA1935 NA NA2240 NAAA 791N3EHAALGADFW NA138919352013-01-01 19:00:00
201311 NA1500 NA NA1825 NAAA1925N3EVAALGAMIA NA109615 02013-01-01 15:00:00
201311 NA 600 NA NA 901 NAB6 125N618JBJFKFLL NA1069 6 02013-01-01 06:00:00
# A shorter way to select flights that departed in January or February
flights |> 
  filter(month %in% c(1, 2))
  # the same as filter(month==1 | month==2)
  # but not filter(month == 1 | 2) - error!
A tibble: 51955 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
20132282157200711023442206 98EV4572N29906EWRGSP 93 59420 72013-02-28 20:00:00
201322821581910168 192215124AA2075N4YSAAEWRDFW182137219102013-02-28 19:00:00
201322821582145 13 49 34 15B6 35N708JBJFKPBI151102821452013-02-28 21:00:00
201322822012155 6 59 41 18B6 43N579JBJFKMCO149 94421552013-02-28 21:00:00
201322822052045 8023042216 489E3395N916XJJFKDCA 38 21320452013-02-28 20:00:00
201322822052159 6 106 56 10B6 11N554JBJFKFLL158106921592013-02-28 21:00:00
201322822182150 28 115 42 33B6 325N504JBJFKTPA154100521502013-02-28 21:00:00
201322822232023120 252 114 98UA1071N37298EWRBQN188158520232013-02-28 20:00:00
201322822262029117 1152333102UA 771N560UAJFKLAX313247520292013-02-28 20:00:00
201322822292230 -1 258 312-14B6 713N523JBJFKSJU193159822302013-02-28 22:00:00
201322822392245 -623392354-15B6 608N197JBJFKPWM 44 27322452013-02-28 22:00:00
201322822422250 -82350 5-15B6 30N351JBJFKROC 51 26422502013-02-28 22:00:00
201322822452245 0 52356 9B6 128N184JBJFKBTV 48 26622452013-02-28 22:00:00
201322822542258 -4 10 19 -9B6 112N655JBJFKBUF 53 30122582013-02-28 22:00:00
201322822572140 7723382241 57EV4276N12567EWRBDL 24 11621402013-02-28 21:00:00
201322822572255 224002357 3B61018N807JBJFKBOS 35 18722552013-02-28 22:00:00
201322823232251 32 182357 21B6 22N279JBJFKSYR 40 20922512013-02-28 22:00:00
201322823422352-10 413 437-24B6 739N656JBJFKPSE193161723522013-02-28 23:00:00
201322823562358 -2 423 438-15B6 707N646JBJFKSJU192159823582013-02-28 23:00:00
201322823592359 0 443 438 5B6 727N641JBJFKBQN192157623592013-02-28 23:00:00
2013228 NA 600 NA NA 722 NAEV5739N833ASLGAIAD NA 229 6 02013-02-28 06:00:00
2013228 NA1820 NA NA2003 NAEV5409N740EVLGAMSN NA 81218202013-02-28 18:00:00
2013228 NA1154 NA NA1447 NAB6 27N274JBJFKTPA NA100511542013-02-28 11:00:00
2013228 NA 900 NA NA1020 NAUS2120NA LGABOS NA 184 9 02013-02-28 09:00:00
2013228 NA 605 NA NA 805 NAMQ4401N730MQLGADTW NA 502 6 52013-02-28 06:00:00
2013228 NA 850 NA NA1035 NAMQ4558N737MQLGACLE NA 419 8502013-02-28 08:00:00
2013228 NA 905 NA NA1115 NAMQ4478N722MQLGADTW NA 502 9 52013-02-28 09:00:00
2013228 NA1115 NA NA1310 NAMQ4485N725MQLGACMH NA 47911152013-02-28 11:00:00
2013228 NA 830 NA NA1205 NAUA1480NA EWRSFO NA2565 8302013-02-28 08:00:00
2013228 NA 840 NA NA1147 NAUA 443NA JFKLAX NA2475 8402013-02-28 08:00:00

arrange()

flights |> arrange(year, month, day, dep_time)
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
2013123121552039 76 2532355 NAB61205N627JBJFKPDX NA245420392013-12-31 20:00:00
2013123121552150 5 110 51 19B61901N729JBJFKFLL164106921502013-12-31 21:00:00
2013123121592155 4 55 46 9B62053N593JBJFKPBI155102821552013-12-31 21:00:00
2013123122062110 56 442339 65B6 775N184JBJFKMSY195118221102013-12-31 21:00:00
2013123122112159 12 100 45 15B61183N715JBJFKMCO148 94421592013-12-31 21:00:00
2013123122182219 -1 315 304 11B61203N625JBJFKSJU202159822192013-12-31 22:00:00
2013123122352245-1023512355 -4B6 234N355JBJFKBTV 49 26622452013-12-31 22:00:00
2013123122452250 -523592356 3B61816N318JBJFKSYR 51 20922502013-12-31 22:00:00
2013123123102255 15 72356 11B6 718N279JBJFKBOS 40 18722552013-12-31 22:00:00
2013123123212250 31 46 8 38B62002N179JBJFKBUF 66 30122502013-12-31 22:00:00
2013123123282330 -2 412 409 3B61389N651JBEWRSJU198160823302013-12-31 23:00:00
2013123123322245 47 58 3 55B6 486N334JBJFKROC 60 26422452013-12-31 22:00:00
2013123123552359 -4 430 440-10B61503N509JBJFKSJU195159823592013-12-31 23:00:00
2013123123562359 -3 436 445 -9B6 745N665JBJFKPSE200161723592013-12-31 23:00:00
20131231 NA1520 NA NA1705 NAAA 341N568AALGAORD NA 73315202013-12-31 15:00:00
20131231 NA2025 NA NA2205 NAAA 371N482AALGAORD NA 73320252013-12-31 20:00:00
20131231 NA1932 NA NA2305 NAB6 161N516JBJFKSMF NA252119322013-12-31 19:00:00
20131231 NA1505 NA NA1725 NAEV4181N24103EWRMCI NA109215 52013-12-31 15:00:00
20131231 NA1000 NA NA1252 NAUA1124NA EWREGE NA172510 02013-12-31 10:00:00
20131231 NA 840 NA NA1205 NAUA1151NA EWRSEA NA2402 8402013-12-31 08:00:00
20131231 NA 754 NA NA1118 NAUA1455NA EWRLAX NA2454 7542013-12-31 07:00:00
20131231 NA2000 NA NA2146 NAUA1482NA EWRORD NA 71920 02013-12-31 20:00:00
20131231 NA1500 NA NA1817 NAUA1483NA EWRAUS NA150415 02013-12-31 15:00:00
20131231 NA1430 NA NA1750 NAUA1493NA EWRLAX NA245414302013-12-31 14:00:00
20131231 NA 855 NA NA1142 NAUA1506NA EWRJAC NA1874 8552013-12-31 08:00:00
20131231 NA 705 NA NA 931 NAUA1729NA EWRDEN NA1605 7 52013-12-31 07:00:00
20131231 NA 825 NA NA1029 NAUS1831NA JFKCLT NA 541 8252013-12-31 08:00:00
20131231 NA1615 NA NA1800 NAMQ3301N844MQLGARDU NA 43116152013-12-31 16:00:00
20131231 NA 600 NA NA 735 NAUA 219NA EWRORD NA 719 6 02013-12-31 06:00:00
20131231 NA 830 NA NA1154 NAUA 443NA JFKLAX NA2475 8302013-12-31 08:00:00
flights |> arrange(desc(dep_delay))
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
2013 1 9 641 9001301124215301272HA 51N384HAJFKHNL6404983 9 02013-01-09 09:00:00
2013 615143219351137160721201127MQ3535N504MQJFKCMH 74 48319352013-06-15 19:00:00
2013 110112116351126123918101109MQ3695N517MQEWRORD111 71916352013-01-10 16:00:00
2013 920113918451014145722101007AA 177N338AAJFKSFO354258618452013-09-20 18:00:00
2013 722 8451600100510441815 989MQ3075N665MQJFKCVG 96 58916 02013-07-22 16:00:00
2013 41011001900 96013422211 931DL2391N959DLJFKTPA139100519 02013-04-10 19:00:00
2013 3172321 810 911 1351020 915DL2119N927DALGAMSP1671020 8102013-03-17 08:00:00
2013 627 9591900 89912362226 850DL2007N3762YJFKPDX313245419 02013-06-27 19:00:00
2013 7222257 759 898 1211026 895DL2047N6716CLGAATL109 762 7592013-07-22 07:00:00
201312 5 7561700 89610582020 878AA 172N5DMAAEWRMIA149108517 02013-12-05 17:00:00
2013 5 311332055 87812502215 875MQ3744N523MQEWRORD112 71920552013-05-03 20:00:00
2013 1 1 8481835 85310011950 851MQ3944N942MQJFKBWI 41 18418352013-01-01 18:00:00
2013 2102243 830 853 1001106 834F9 835N203FRLGADEN2331620 8302013-02-10 08:00:00
2013 519 7131700 85310071955 852AA 257N3HEAAJFKLAS323224817 02013-05-19 17:00:00
20131219 7341725 84910462039 847DL1223N375NCEWRSLC290196917252013-12-19 17:00:00
20131217 7051700 84510262020 846AA 172N5EMAAEWRMIA145108517 02013-12-17 17:00:00
20131214 8301845 82512102154 856DL2391N939DLJFKTPA173100518452013-12-14 18:00:00
2013 419 9121940 81212282247 821DL1435N900DELGATPA174101019402013-04-19 19:00:00
2013 627 7531830 803 9372015 802AA2019N571AALGASTL134 88818302013-06-27 18:00:00
2013 31810202100 8001336 32 784DL2363N624AGJFKLAX335247521 02013-03-18 21:00:00
201311 3 6031645 798 8291913 796DL2042N990ATEWRATL109 74616452013-11-03 16:00:00
2013 419 6171700 797 8581955 783AA 257N3GJAAJFKLAS313224817 02013-04-19 17:00:00
2013 627 6151705 790 8532004 769DL 503N372DAJFKSAN312244617 52013-06-27 17:00:00
2013 21923241016 788 1141227 767DL2319N324USLGAMSP136102010162013-02-19 10:00:00
2013 627 7321825 787 9322032 780DL1715N335NBLGAMSY160118318252013-06-27 18:00:00
2013 2241921 615 7862135 842 773DL 575N348NWEWRATL111 746 6152013-02-24 06:00:00
2013 419 6061725 761 9232020 783AA1901N3DGAAJFKIAH222141717252013-04-19 17:00:00
2013 419 7581925 75310492225 744DL1485N927DALGAMCO149 95019252013-04-19 19:00:00
2013 216 7571930 74710132149 7449E3798N8940EJFKCLT 85 54119302013-02-16 19:00:00
201310142042 900 70222551127 688DL 502N943DLEWRATL 98 746 9 02013-10-14 09:00:00
.........................................................
2013924NA1625NANA1750NAMQ3622N524MQLGABNANA 76416252013-09-24 16:00:00
2013925NA1259NANA1507NAEV5207N615QXLGACLTNA 54412592013-09-25 12:00:00
2013925NA 845NANA1018NAEV5286N615QXLGABTVNA 258 8452013-09-25 08:00:00
2013925NA1755NANA1932NAEV5287N722EVLGAMSNNA 81217552013-09-25 17:00:00
2013925NA 600NANA 716NAEV5716N877ASJFKIADNA 228 6 02013-09-25 06:00:00
2013925NA 836NANA 944NAB62280N258JBEWRBOSNA 200 8362013-09-25 08:00:00
2013925NA1300NANA1409NAUS2148NA LGABOSNA 18413 02013-09-25 13:00:00
2013925NA1900NANA2014NAUS2160NA LGABOSNA 18419 02013-09-25 19:00:00
2013925NA1300NANA1450NAMQ3388N817MQLGACMHNA 47913 02013-09-25 13:00:00
2013925NA1655NANA1840NAMQ3411N735MQLGARDUNA 43116552013-09-25 16:00:00
2013925NA1559NANA1719NAMQ3748N530MQEWRORDNA 71915592013-09-25 15:00:00
2013926NA 915NANA1141NAEV5109N748EVLGACHSNA 641 9152013-09-26 09:00:00
2013926NA1400NANA1512NAUS2183NA LGADCANA 21414 02013-09-26 14:00:00
2013926NA1240NANA1525NAWN4720N691WNEWRHOUNA141112402013-09-26 12:00:00
2013927NA 600NANA 730NAAA 301N584AALGAORDNA 733 6 02013-09-27 06:00:00
2013927NA2100NANA2211NAUS2164NA LGABOSNA 18421 02013-09-27 21:00:00
2013927NA1329NANA1444NAMQ3760N505MQEWRORDNA 71913292013-09-27 13:00:00
2013927NA1600NANA1739NAUA 269NA LGAORDNA 73316 02013-09-27 16:00:00
2013928NA1803NANA1927NAEV5563N724EVLGABTVNA 25818 32013-09-28 18:00:00
2013928NA 910NANA1220NAAA 1N320AAJFKLAXNA2475 9102013-09-28 09:00:00
2013928NA1635NANA1827NAUS 581NA EWRCLTNA 52916352013-09-28 16:00:00
2013929NA2054NANA2302NAEV4536N13988EWRCVGNA 56920542013-09-29 20:00:00
2013929NA1830NANA2010NAMQ3134N508MQEWRORDNA 71918302013-09-29 18:00:00
2013929NA 700NANA 833NAUA 331NA LGAORDNA 733 7 02013-09-29 07:00:00
2013930NA1842NANA2019NAEV5274N740EVLGABNANA 76418422013-09-30 18:00:00
2013930NA1455NANA1634NA9E3393NA JFKDCANA 21314552013-09-30 14:00:00
2013930NA2200NANA2312NA9E3525NA LGASYRNA 19822 02013-09-30 22:00:00
2013930NA1210NANA1330NAMQ3461N535MQLGABNANA 76412102013-09-30 12:00:00
2013930NA1159NANA1344NAMQ3572N511MQLGACLENA 41911592013-09-30 11:00:00
2013930NA 840NANA1020NAMQ3531N839MQLGARDUNA 431 8402013-09-30 08:00:00

distinct()

# Remove duplicate rows, if any
flights |> 
  distinct()
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
# # Find all unique origin and destination pairs
flights |> 
  distinct(origin, dest)
A tibble: 224 x 2
origindest
<chr><chr>
EWRIAH
LGAIAH
JFKMIA
JFKBQN
LGAATL
EWRORD
EWRFLL
LGAIAD
JFKMCO
LGAORD
JFKPBI
JFKTPA
JFKLAX
EWRSFO
LGADFW
JFKBOS
EWRLAS
LGAFLL
EWRPBI
LGAMSP
LGADTW
EWRMIA
JFKATL
JFKSFO
JFKRSW
JFKSJU
EWRATL
EWRPHX
LGAMIA
EWRMSP
......
JFKACK
LGABGR
LGAMSN
LGAORF
JFKIAH
JFKMCI
LGAOMA
LGADSM
LGAGSP
JFKABQ
LGAILM
LGASYR
JFKMVY
LGASBN
JFKSTL
LGALEX
EWRSBN
LGAMHT
LGACAE
JFKJAC
JFKSDF
LGACHO
JFKMKE
LGAAVL
JFKBHM
LGATVC
LGAMYR
EWRTVC
EWRANC
EWRLGA
# keep other columns when filtering for unique rows
flights |> 
  distinct(origin, dest, .keep_all = TRUE)
A tibble: 224 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
201311622630-810171014 3US 245N807AWEWRPHX34221336302013-01-01 06:00:00
20131162361013 920 915 5AA1837N3EMAALGAMIA15310966102013-01-01 06:00:00
201311624630-6 909 840 29EV4626N11107EWRMSP19010086302013-01-01 06:00:00
.........................................................
201310 111491159-1012451259-14B61191N346JBJFKACK 39 19911592013-10-01 11:00:00
201310 11338110515314461245121EV5309N713EVLGABGR 57 37811 52013-10-01 11:00:00
201310 113451350 -514571526-29EV5181N612QXLGAMSN115 81213502013-10-01 13:00:00
201310 115431545 -217021721-19EV5293N398CALGAORF 46 29615452013-10-01 15:00:00
201310 116581700 -220021955 7AA 211N3FVAAJFKIAH214141717 02013-10-01 17:00:00
201310 117281736 -819102006-569E3310N907XJJFKMCI143111317362013-10-01 17:00:00
201310 117321730 219251953-28EV5298N709EVLGAOMA153114817302013-10-01 17:00:00
201310 118571900 -320442117-339E3445N906XJLGADSM146103119 02013-10-01 19:00:00
201310 119301900 3021292119 109E3854N8884ELGAGSP 86 61019 02013-10-01 19:00:00
201310 119552001 -622132248-35B6 65N554JBJFKABQ230182620 12013-10-01 20:00:00
201310 120512110-1922192306-47EV4885N752EVLGAILM 66 50021102013-10-01 21:00:00
201310 121502159 -922462308-229E3525N922XJLGASYR 34 19821592013-10-01 21:00:00
201310 314141350 2415141453 21B61338N368JBJFKMVY 36 17313502013-10-03 13:00:00
2013101818201745 3520302011 19EV5624N748EVLGASBN104 65117452013-10-18 17:00:00
201311 215191459 2016541702 -8DL 567N329NBJFKSTL129 89214592013-11-02 14:00:00
2013112420262035 -922272249-229E3669N8604CLGALEX 90 60420352013-11-24 20:00:00
201312 112411215 2614311431 0EV3264N16976EWRSBN 94 63712152013-12-01 12:00:00
2013121910561103 -712101219 -9EV5252N848ASLGAMHT 39 19511 32013-12-19 11:00:00
2013121919141915 -121302144-14EV5567N848ASLGACAE 97 61719152013-12-19 19:00:00
20131221 802 800 211151102 13DL2379N6705YJFKJAC2811894 8 02013-12-21 08:00:00
2013122116101610 018391855-169E2941N8930EJFKSDF114 66216102013-12-21 16:00:00
2013 3 120462045 122062220-14EV5343N197PQLGACHO 48 30520452013-03-01 20:00:00
2013 3 216061610 -417231758-359E3437N927XJJFKMKE109 74516102013-03-02 16:00:00
2013 414 937 945 -811301159-299E3632N8936ALGAAVL 93 599 9452013-04-14 09:00:00
2013 52510071000 711291148-19EV5621N751EVJFKBHM117 86510 02013-05-25 10:00:00
2013 61417561800 -420042005 -1MQ3402N722MQLGATVC101 65518 02013-06-14 18:00:00
2013 61515171520 -316561730-34EV5163N605QXLGAMYR 74 56315202013-06-15 15:00:00
2013 7 513551400 -515501620-30EV4137N11164EWRTVC 93 64414 02013-07-05 14:00:00
2013 7 616291615 1419541953 1UA 887N587UAEWRANC418337016152013-07-06 16:00:00
2013 727 NA 106 NA NA 245 NAUS1632NA EWRLGA NA 17 1 62013-07-27 01:00:00

If you want to find the number of occurrences instead, you’re better off swapping distinct() for count(), and with the sort = TRUE argument you can arrange them in descending order of number of occurrences.

flights |>
  count(origin, dest, sort = TRUE)
A tibble: 224 x 3
origindestn
<chr><chr><int>
JFKLAX11262
LGAATL10263
LGAORD 8857
JFKSFO 8204
LGACLT 6168
EWRORD 6100
JFKBOS 5898
LGAMIA 5781
JFKMCO 5464
EWRBOS 5327
EWRSFO 5127
LGADTW 5040
EWRCLT 5026
EWRATL 5022
EWRMCO 4941
EWRLAX 4912
LGADFW 4858
JFKSJU 4752
LGADCA 4716
LGABOS 4283
JFKFLL 4254
LGAFLL 4008
JFKLAS 3987
EWRIAH 3973
EWRFLL 3793
LGAMSP 3713
LGADEN 3704
LGAMCO 3677
JFKBUF 3582
LGARDU 3581
.........
JFKEGE103
LGAOMA 95
LGAIND 87
LGATVC 77
LGASAV 68
EWRMYR 56
LGACHO 52
EWRPHL 49
JFKSDF 46
LGAGRR 46
EWRBZN 36
EWRTVC 24
EWRJAC 23
JFKPSP 19
LGAEYW 17
EWRHDN 15
EWRMTJ 15
LGABWI 15
LGACAE 12
LGAAVL 10
EWRANC 8
LGASBN 6
EWRSBN 4
LGAMYR 3
JFKJAC 2
EWRLGA 1
JFKBHM 1
JFKMEM 1
JFKSTL 1
LGALEX 1

Exercises

3.2.1. Single pipeline:

  • Had an arrival delay of two or more hours
  • Flew to Houston (IAH or HOU)
  • Were operated by United, American, or Delta
  • Departed in summer (July, August, and September)
  • Arrived more than two hours late, but didn’t leave late
  • Were delayed by at least an hour, but made up over 30 minutes in flight

3.2.2. Sort flights to find the flights with longest departure delays. Find the flights that left earliest in the morning.

3.2.3. Sort flights to find the fastest flights. (Hint: Try including a math calculation inside of your function.)

3.2.4. Was there a flight on every day of 2013?

3.2.5. Which flights traveled the farthest distance? Which traveled the least distance?

3.2.6. Does it matter what order you used filter() and arrange() if you’re using both? Why/why not? Think about the results and how much work the functions would have to do.

#3.2.1. Had an arrival delay of two or more hours

#3.2.1. Flew to Houston (IAH or HOU)

#3.2.1. Were operated by United, American, or Delta
#
#3.2.1. Departed in summer (July, August, and September)

#3.2.1. Arrived more than two hours late, but didn’t leave late

#3.2.1. Were delayed by at least an hour, but made up over 30 minutes in flight


#3.2.2. Sort flights to find the flights with longest departure delays. Find the flights that left earliest in the morning.


#3.2.3. Sort flights to find the fastest flights. (Hint: Try including a math calculation inside of your function.)


#3.2.4. Was there a flight on every day of 2013?

#
#3.2.5. Which flights traveled the farthest distance? Which traveled the least distance?


#3.2.6. Does it matter what order you used filter() and arrange() if you’re using both? Why/why not? 
# Think about the results and how much work the functions would have to do.

Columns

There are four important verbs that affect the columns without changing the rows:

  • mutate() creates new columns that are derived from the existing columns
  • select() changes which columns are present
  • rename() changes the names of the columns
  • relocate() changes the positions of the columns.

mutate()

By default, mutate() adds new columns on the right hand side of your dataset, which makes it difficult to see what’s happening here. We can use the .before argument to instead add the variables to the left hand side.

flights |> 
  mutate(
    gain = dep_delay - arr_delay,       # gain, how much time a delayed flight made up in the air
    speed = distance / air_time * 60,   # the speed in miles per hour
    .after = day                         # try: .before = 1
  )
A tibble: 336776 x 21
yearmonthdaygainspeeddep_timesched_dep_timedep_delayarr_timesched_arr_time...carrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><dbl><dbl><int><int><dbl><int><int>...<chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311 -9370.0441517515 2 830 819...UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311-16374.2731533529 4 850 830...UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311-31408.3750542540 2 923 850...AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311 17516.7213544545-110041022...B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311 19394.1379554600-6 812 837...DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311-16287.6000554558-4 740 728...UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311-24404.4304555600-5 913 854...B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311 11259.2453557600-3 709 723...EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311 5404.5714557600-3 838 846...B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311-10318.6957558600-2 753 745...AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311 0413.9597558600-2 849 851...B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311 1381.6456558600-2 853 856...B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311 -9430.4348558600-2 924 917...UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311 12426.3158558600-2 923 937...UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311-32324.2802559600-1 941 910...AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311 4255.0000559559 0 702 706...B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311 7396.4985559600-1 854 902...UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311 7424.7368600600 0 851 858...B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311-12341.1940600600 0 837 825...MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311 7417.5510601600 1 844 850...B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311 0360.0000602610-8 812 820...DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311-19286.8571602605-3 821 805...MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311 8428.2895606610-4 858 910...AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311 4356.2500606610-4 837 845...DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311 17414.6497607607 0 858 915...UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311-24310.3597608600 8 807 735...MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
201311 -3423.934461160011 945 931...UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311 -1368.2286613610 3 925 921...B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311 21526.8132615615 010391100...B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311 9373.0000615615 0 833 842...DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
............................................................
2013930 22406.666721232125 -222232247...EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
2013930 7440.833321272129 -223142323...EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
2013930 29458.028221282130 -223282359...B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
2013930 32389.333321292059 3022302232...EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
2013930 21355.000021312140 -922252255...MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
2013930 30498.322121402140 0 10 40...AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
2013930 2362.553221422129 1322502239...EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
2013930 25499.375021452145 0 115 140...B61103N633JBJFKSJU192159821452013-09-30 21:00:00
2013930 7464.460421472137 10 30 27...B61371N627JBLGAFLL139107621372013-09-30 21:00:00
2013930 16324.324321492156 -722452308...UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
2013930 7321.538521502159 -922502306...EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
2013930 0361.20002159184519423442030...9E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
2013930-10371.803322032205 -223392331...EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
2013930 20472.577322072140 2722572250...MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
2013930 15436.000022112059 7223392242...EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
2013930 7341.250022312245-1423352356...B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
2013930 38483.962322332113 80 112 30...UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
2013930 24460.487822352001154 592249...B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
2013930 0371.162822372245 -823452353...B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
2013930 12305.853722402245 -523342351...B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
2013930 10347.307722402250-102347 7...B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
2013930 11337.021322412246 -52345 1...B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
2013930 11340.000023072255 1223592358...B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
2013930 15495.000023492359-10 325 350...B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA NA NA1842 NA NA2019...EV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA NA NA1455 NA NA1634...9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA NA NA2200 NA NA2312...9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA NA NA1210 NA NA1330...MQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA NA NA1159 NA NA1344...MQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA NA NA 840 NA NA1020...MQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
flights |> 
  mutate(
    gain = dep_delay - arr_delay,
    hours = air_time / 60,
    gain_per_hour = gain / hours,
    .keep = "used"
  )
A tibble: 336776 x 6
dep_delayarr_delayair_timegainhoursgain_per_hour
<dbl><dbl><dbl><dbl><dbl><dbl>
2 11227 -93.7833333 -2.3788546
4 20227-163.7833333 -4.2290749
2 33160-312.6666667-11.6250000
-1-18183 173.0500000 5.5737705
-6-25116 191.9333333 9.8275862
-4 12150-162.5000000 -6.4000000
-5 19158-242.6333333 -9.1139241
-3-14 53 110.8833333 12.4528302
-3 -8140 52.3333333 2.1428571
-2 8138-102.3000000 -4.3478261
-2 -2149 02.4833333 0.0000000
-2 -3158 12.6333333 0.3797468
-2 7345 -95.7500000 -1.5652174
-2-14361 126.0166667 1.9944598
-1 31257-324.2833333 -7.4708171
0 -4 44 40.7333333 5.4545455
-1 -8337 75.6166667 1.2462908
0 -7152 72.5333333 2.7631579
0 12134-122.2333333 -5.3731343
1 -6147 72.4500000 2.8571429
-8 -8170 02.8333333 0.0000000
-3 16105-191.7500000-10.8571429
-4-12152 82.5333333 3.1578947
-4 -8128 42.1333333 1.8750000
0-17157 172.6166667 6.4968153
8 32139-242.3166667-10.3597122
11 14366 -36.1000000 -0.4918033
3 4175 -12.9166667 -0.3428571
0-21182 213.0333333 6.9230769
0 -9120 92.0000000 4.5000000
..................
-2-24 45 220.750000029.333333
-2 -9 72 71.2000000 5.833333
-2-31213 293.5500000 8.169014
30 -2 45 320.750000042.666667
-9-30 36 210.600000035.000000
0-30298 304.9666667 6.040268
13 11 47 20.7833333 2.553191
0-25192 253.2000000 7.812500
10 3139 72.3166667 3.021583
-7-23 37 160.616666725.945946
-9-16 39 70.650000010.769231
194194 50 00.8333333 0.000000
-2 8 61-101.0166667-9.836066
27 7 97 201.616666712.371134
72 57120 152.0000000 7.500000
-14-21 48 70.8000000 8.750000
80 42318 385.3000000 7.169811
154130123 242.050000011.707317
-8 -8 43 00.7166667 0.000000
-5-17 41 120.683333317.560976
-10-20 52 100.866666711.538462
-5-16 47 110.783333314.042553
12 1 33 110.550000020.000000
-10-25196 153.2666667 4.591837
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA
NA NA NA NA NA NA

select()

There are a number of helper functions you can use within select():

  • starts_with("abc"): matches names that begin with “abc”.
  • ends_with("xyz"): matches names that end with “xyz”.
  • contains("ijk"): matches names that contain “ijk”.
  • num_range("x", 1:3): matches x1, x2 and x3.

See ?select for more details

flights |> 
  select(year, month, day)
A tibble: 336776 x 3
yearmonthday
<int><int><int>
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
.........
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
flights |> 
  select(year:day)
A tibble: 336776 x 3
yearmonthday
<int><int><int>
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
201311
.........
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
2013930
flights |> 
  select(!year:day)
A tibble: 336776 x 16
dep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
61160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
................................................
21232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
21272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
21282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
21292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
21312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
21402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
21422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
21452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
21472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
21492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
21502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
21591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
22032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
22072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
22112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
22312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
22332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
22352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
22372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
22402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
22402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
22412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
23072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
23492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
#Select all columns that are characters:

flights |> 
  select(!where(is.character))
A tibble: 336776 x 15
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delayflightair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><int><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11154522714005152013-01-01 05:00:00
201311533529 4 850 830 20171422714165292013-01-01 05:00:00
201311542540 2 923 850 33114116010895402013-01-01 05:00:00
201311544545-110041022-18 72518315765452013-01-01 05:00:00
201311554600-6 812 837-25 461116 7626 02013-01-01 06:00:00
201311554558-4 740 728 121696150 7195582013-01-01 05:00:00
201311555600-5 913 854 19 50715810656 02013-01-01 06:00:00
201311557600-3 709 723-145708 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8 79140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8 301138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2 4914910286 02013-01-01 06:00:00
201311558600-2 853 856 -3 7115810056 02013-01-01 06:00:00
201311558600-2 924 917 7 19434524756 02013-01-01 06:00:00
201311558600-2 923 937-14112436125656 02013-01-01 06:00:00
201311559600-1 941 910 31 70725713896 02013-01-01 06:00:00
201311559559 0 702 706 -41806 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8118733722276 02013-01-01 06:00:00
201311600600 0 851 858 -7 37115210766 02013-01-01 06:00:00
201311600600 0 837 825 124650134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6 34314710236 02013-01-01 06:00:00
201311602610-8 812 820 -8191917010206102013-01-01 06:00:00
201311602605-3 821 805 164401105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12189515210856102013-01-01 06:00:00
201311606610-4 837 845 -81743128 7606102013-01-01 06:00:00
201311607607 0 858 915-17107715710856 72013-01-01 06:00:00
201311608600 8 807 735 323768139 7196 02013-01-01 06:00:00
20131161160011 945 931 14 30336625866 02013-01-01 06:00:00
201311613610 3 925 921 4 13517510746102013-01-01 06:00:00
201311615615 010391100-21 70918215986152013-01-01 06:00:00
201311615615 0 833 842 -9 575120 7466152013-01-01 06:00:00
.............................................
201393021232125 -222232247-245489 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -93833 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31 97213162621302013-09-30 21:00:00
201393021292059 3022302232 -25048 45 29220592013-09-30 20:00:00
201393021312140 -922252255-303621 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30 185298247521402013-09-30 21:00:00
201393021422129 1322502239 114509 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-251103192159821452013-09-30 21:00:00
201393021472137 10 30 27 31371139107621372013-09-30 21:00:00
201393021492156 -722452308-23 523 37 20021562013-09-30 21:00:00
201393021502159 -922502306-163842 39 20921592013-09-30 21:00:00
201393021591845194234420301943320 50 30118452013-09-30 18:00:00
201393022032205 -223392331 85311 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 73660 97 76421402013-09-30 21:00:00
201393022112059 7223392242 574672120 87220592013-09-30 20:00:00
201393022312245-1423352356-21 108 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42 471318256521132013-09-30 21:00:00
201393022352001154 5922491301083123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8 234 43 26622452013-09-30 22:00:00
201393022402245 -523342351-171816 41 20922452013-09-30 22:00:00
201393022402250-102347 7-202002 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16 486 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1 718 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25 745196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NA5274 NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA3393 NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA3525 NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NA3461 NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NA3572 NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NA3531 NA 431 8402013-09-30 08:00:00
# rename variable while select
flights |> 
  select(tail_num = tailnum)
A tibble: 336776 x 1
tail_num
<chr>
N14228
N24211
N619AA
N804JB
N668DN
N39463
N516JB
N829AS
N593JB
N3ALAA
N793JB
N657JB
N29129
N53441
N3DUAA
N708JB
N76515
N595JB
N542MQ
N644JB
N971DL
N730MQ
N633AA
N3739P
N53442
N9EAMQ
N532UA
N635JB
N794JB
N326NB
...
N712EV
N16546
N807JB
N751EV
N807MQ
N335AA
N12957
N633JB
N627JB
N813UA
N10575
N906XJ
N722EV
N532MQ
N12145
N193JB
N578UA
N804JB
N318JB
N354JB
N281JB
N346JB
N565JB
N516JB
N740EV
NA
NA
N535MQ
N511MQ
N839MQ

rename()

If you want to keep all the existing variables and just want to rename a few, you can use rename() instead of select():

also try: janitor::clean_names()

flights |> 
  rename(tail_num = tailnum)
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttail_numorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00

relocate()

By default relocate() moves variables to the front:

flights |> 
  relocate(time_hour, air_time)

# flights |> 
#   relocate(year:dep_time, .after = time_hour)
# flights |> 
#   relocate(starts_with("arr"), .before = dep_time)
A tibble: 336776 x 19
time_hourair_timeyearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestdistancehourminute
<dttm><dbl><int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl>
2013-01-01 05:00:00227201311517515 2 830 819 11UA1545N14228EWRIAH1400515
2013-01-01 05:00:00227201311533529 4 850 830 20UA1714N24211LGAIAH1416529
2013-01-01 05:00:00160201311542540 2 923 850 33AA1141N619AAJFKMIA1089540
2013-01-01 05:00:00183201311544545-110041022-18B6 725N804JBJFKBQN1576545
2013-01-01 06:00:00116201311554600-6 812 837-25DL 461N668DNLGAATL 7626 0
2013-01-01 05:00:00150201311554558-4 740 728 12UA1696N39463EWRORD 719558
2013-01-01 06:00:00158201311555600-5 913 854 19B6 507N516JBEWRFLL10656 0
2013-01-01 06:00:00 53201311557600-3 709 723-14EV5708N829ASLGAIAD 2296 0
2013-01-01 06:00:00140201311557600-3 838 846 -8B6 79N593JBJFKMCO 9446 0
2013-01-01 06:00:00138201311558600-2 753 745 8AA 301N3ALAALGAORD 7336 0
2013-01-01 06:00:00149201311558600-2 849 851 -2B6 49N793JBJFKPBI10286 0
2013-01-01 06:00:00158201311558600-2 853 856 -3B6 71N657JBJFKTPA10056 0
2013-01-01 06:00:00345201311558600-2 924 917 7UA 194N29129JFKLAX24756 0
2013-01-01 06:00:00361201311558600-2 923 937-14UA1124N53441EWRSFO25656 0
2013-01-01 06:00:00257201311559600-1 941 910 31AA 707N3DUAALGADFW13896 0
2013-01-01 05:00:00 44201311559559 0 702 706 -4B61806N708JBJFKBOS 187559
2013-01-01 06:00:00337201311559600-1 854 902 -8UA1187N76515EWRLAS22276 0
2013-01-01 06:00:00152201311600600 0 851 858 -7B6 371N595JBLGAFLL10766 0
2013-01-01 06:00:00134201311600600 0 837 825 12MQ4650N542MQLGAATL 7626 0
2013-01-01 06:00:00147201311601600 1 844 850 -6B6 343N644JBEWRPBI10236 0
2013-01-01 06:00:00170201311602610-8 812 820 -8DL1919N971DLLGAMSP1020610
2013-01-01 06:00:00105201311602605-3 821 805 16MQ4401N730MQLGADTW 5026 5
2013-01-01 06:00:00152201311606610-4 858 910-12AA1895N633AAEWRMIA1085610
2013-01-01 06:00:00128201311606610-4 837 845 -8DL1743N3739PJFKATL 760610
2013-01-01 06:00:00157201311607607 0 858 915-17UA1077N53442EWRMIA10856 7
2013-01-01 06:00:00139201311608600 8 807 735 32MQ3768N9EAMQEWRORD 7196 0
2013-01-01 06:00:0036620131161160011 945 931 14UA 303N532UAJFKSFO25866 0
2013-01-01 06:00:00175201311613610 3 925 921 4B6 135N635JBJFKRSW1074610
2013-01-01 06:00:00182201311615615 010391100-21B6 709N794JBJFKSJU1598615
2013-01-01 06:00:00120201311615615 0 833 842 -9DL 575N326NBEWRATL 746615
.........................................................
2013-09-30 21:00:00 45201393021232125 -222232247-24EV5489N712EVLGACHO 3052125
2013-09-30 21:00:00 72201393021272129 -223142323 -9EV3833N16546EWRCLT 5292129
2013-09-30 21:00:00213201393021282130 -223282359-31B6 97N807JBJFKDEN16262130
2013-09-30 20:00:00 45201393021292059 3022302232 -2EV5048N751EVLGARIC 2922059
2013-09-30 21:00:00 36201393021312140 -922252255-30MQ3621N807MQJFKDCA 2132140
2013-09-30 21:00:00298201393021402140 0 10 40-30AA 185N335AAJFKLAX24752140
2013-09-30 21:00:00 47201393021422129 1322502239 11EV4509N12957EWRPWM 2842129
2013-09-30 21:00:00192201393021452145 0 115 140-25B61103N633JBJFKSJU15982145
2013-09-30 21:00:00139201393021472137 10 30 27 3B61371N627JBLGAFLL10762137
2013-09-30 21:00:00 37201393021492156 -722452308-23UA 523N813UAEWRBOS 2002156
2013-09-30 21:00:00 39201393021502159 -922502306-16EV3842N10575EWRMHT 2092159
2013-09-30 18:00:00 50201393021591845194234420301949E3320N906XJJFKBUF 3011845
2013-09-30 22:00:00 61201393022032205 -223392331 8EV5311N722EVLGABGR 37822 5
2013-09-30 21:00:00 97201393022072140 2722572250 7MQ3660N532MQLGABNA 7642140
2013-09-30 20:00:00120201393022112059 7223392242 57EV4672N12145EWRSTL 8722059
2013-09-30 22:00:00 48201393022312245-1423352356-21B6 108N193JBJFKPWM 2732245
2013-09-30 21:00:00318201393022332113 80 112 30 42UA 471N578UAEWRSFO25652113
2013-09-30 20:00:00123201393022352001154 592249130B61083N804JBJFKMCO 94420 1
2013-09-30 22:00:00 43201393022372245 -823452353 -8B6 234N318JBJFKBTV 2662245
2013-09-30 22:00:00 41201393022402245 -523342351-17B61816N354JBJFKSYR 2092245
2013-09-30 22:00:00 52201393022402250-102347 7-20B62002N281JBJFKBUF 3012250
2013-09-30 22:00:00 47201393022412246 -52345 1-16B6 486N346JBJFKROC 2642246
2013-09-30 22:00:00 33201393023072255 1223592358 1B6 718N565JBJFKBOS 1872255
2013-09-30 23:00:00196201393023492359-10 325 350-25B6 745N516JBJFKPSE16172359
2013-09-30 18:00:00 NA2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA 7641842
2013-09-30 14:00:00 NA2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA 2131455
2013-09-30 22:00:00 NA2013930 NA2200 NA NA2312 NA9E3525NA LGASYR 19822 0
2013-09-30 12:00:00 NA2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA 7641210
2013-09-30 11:00:00 NA2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE 4191159
2013-09-30 08:00:00 NA2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU 431 840

3.3. Exercises

3.3.1 Compare dep_time, sched_dep_time, and dep_delay. How would you expect those three numbers to be related?


3.3.2. Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.


3.3.3. What happens if you specify the name of the same variable multiple times in a select() call?


3.3.4. What does the any_of() function do? Why might it be helpful in conjunction with this vector?

variables <- c("year", "month", "day", "dep_delay", "arr_delay")

3.3.5. Does the result of running the following code surprise you? How do the select helpers deal with upper and lower case by default? How can you change that default?

flights |> select(contains("TIME"))
A tibble: 336776 x 6
dep_timesched_dep_timearr_timesched_arr_timeair_timetime_hour
<int><int><int><int><dbl><dttm>
517515 830 8192272013-01-01 05:00:00
533529 850 8302272013-01-01 05:00:00
542540 923 8501602013-01-01 05:00:00
544545100410221832013-01-01 05:00:00
554600 812 8371162013-01-01 06:00:00
554558 740 7281502013-01-01 05:00:00
555600 913 8541582013-01-01 06:00:00
557600 709 723 532013-01-01 06:00:00
557600 838 8461402013-01-01 06:00:00
558600 753 7451382013-01-01 06:00:00
558600 849 8511492013-01-01 06:00:00
558600 853 8561582013-01-01 06:00:00
558600 924 9173452013-01-01 06:00:00
558600 923 9373612013-01-01 06:00:00
559600 941 9102572013-01-01 06:00:00
559559 702 706 442013-01-01 05:00:00
559600 854 9023372013-01-01 06:00:00
600600 851 8581522013-01-01 06:00:00
600600 837 8251342013-01-01 06:00:00
601600 844 8501472013-01-01 06:00:00
602610 812 8201702013-01-01 06:00:00
602605 821 8051052013-01-01 06:00:00
606610 858 9101522013-01-01 06:00:00
606610 837 8451282013-01-01 06:00:00
607607 858 9151572013-01-01 06:00:00
608600 807 7351392013-01-01 06:00:00
611600 945 9313662013-01-01 06:00:00
613610 925 9211752013-01-01 06:00:00
615615103911001822013-01-01 06:00:00
615615 833 8421202013-01-01 06:00:00
..................
2123212522232247 452013-09-30 21:00:00
2127212923142323 722013-09-30 21:00:00
21282130232823592132013-09-30 21:00:00
2129205922302232 452013-09-30 20:00:00
2131214022252255 362013-09-30 21:00:00
21402140 10 402982013-09-30 21:00:00
2142212922502239 472013-09-30 21:00:00
21452145 115 1401922013-09-30 21:00:00
21472137 30 271392013-09-30 21:00:00
2149215622452308 372013-09-30 21:00:00
2150215922502306 392013-09-30 21:00:00
2159184523442030 502013-09-30 18:00:00
2203220523392331 612013-09-30 22:00:00
2207214022572250 972013-09-30 21:00:00
22112059233922421202013-09-30 20:00:00
2231224523352356 482013-09-30 22:00:00
22332113 112 303182013-09-30 21:00:00
22352001 5922491232013-09-30 20:00:00
2237224523452353 432013-09-30 22:00:00
2240224523342351 412013-09-30 22:00:00
224022502347 7 522013-09-30 22:00:00
224122462345 1 472013-09-30 22:00:00
2307225523592358 332013-09-30 22:00:00
23492359 325 3501962013-09-30 23:00:00
NA1842 NA2019 NA2013-09-30 18:00:00
NA1455 NA1634 NA2013-09-30 14:00:00
NA2200 NA2312 NA2013-09-30 22:00:00
NA1210 NA1330 NA2013-09-30 12:00:00
NA1159 NA1344 NA2013-09-30 11:00:00
NA 840 NA1020 NA2013-09-30 08:00:00

3.3.6. Rename air_time to air_time_min to indicate units of measurement and move it to the beginning of the data frame.


3.3.7. Why doesn’t the following work, and what does the error mean?

flights |> 
  select(tailnum) |> 
  arrange(arr_delay)
Error in `arrange()`:
i In argument: `..1 = arr_delay`.
Caused by error:
! object 'arr_delay' not found
Traceback:


1. arrange(select(flights, tailnum), arr_delay)

2. arrange.data.frame(select(flights, tailnum), arr_delay)

3. arrange_rows(.data, dots = dots, locale = .locale)

4. mutate(data, `:=`("{name}", !!dot), .keep = "none")

5. mutate.data.frame(data, `:=`("{name}", !!dot), .keep = "none")

6. mutate_cols(.data, dplyr_quosures(...), by)

7. withCallingHandlers(for (i in seq_along(dots)) {
 .     poke_error_context(dots, i, mask = mask)
 .     context_poke("column", old_current_column)
 .     new_columns <- mutate_col(dots[[i]], data, mask, new_columns)
 . }, error = dplyr_error_handler(dots = dots, mask = mask, bullets = mutate_bullets, 
 .     error_call = error_call, error_class = "dplyr:::mutate_error"), 
 .     warning = dplyr_warning_handler(state = warnings_state, mask = mask, 
 .         error_call = error_call))

8. mutate_col(dots[[i]], data, mask, new_columns)

9. mask$eval_all_mutate(quo)

10. eval()

11. .handleSimpleError(function (cnd) 
  . {
  .     local_error_context(dots, i = frame[[i_sym]], mask = mask)
  .     if (inherits(cnd, "dplyr:::internal_error")) {
  .         parent <- error_cnd(message = bullets(cnd))
  .     }
  .     else {
  .         parent <- cnd
  .     }
  .     message <- c(cnd_bullet_header(action), i = if (has_active_group_context(mask)) cnd_bullet_cur_group_label())
  .     abort(message, class = error_class, parent = parent, call = error_call)
  . }, "object 'arr_delay' not found", base::quote(NULL))

12. h(simpleError(msg, call))

13. abort(message, class = error_class, parent = parent, call = error_call)

14. signal_abort(cnd, .file)

The pipe

flights |> 
  filter(dest == "IAH") |> 
  mutate(speed = distance / air_time * 60) |> 
  select(year:day, dep_time, carrier, flight, speed) |> 
  arrange(desc(speed))
A tibble: 7198 x 7
yearmonthdaydep_timecarrierflightspeed
<int><int><int><int><chr><int><dbl>
20137 9 707UA 226521.7391
20138271850UA1128521.2270
2013828 902UA1711518.5185
20138282122UA1022518.5185
20136111628UA1178515.3374
20138271017UA 333515.3374
20138271205UA1421515.3374
20138271758UA 302515.3374
2013927 521UA 252515.3374
2013828 625UA 559514.9091
20138281505UA1695512.1951
20139 72025UA1022512.1951
20139 81830UA 404512.1951
20138271651UA 525511.8072
20139 4 542UA1714511.8072
20139 62006UA1128511.8072
20139 81849UA1128511.8072
20136112017UA 350509.0909
20137 82131UA 260509.0909
20137 82213UA1740509.0909
20137121718UA 268509.0909
2013719 634UA 226509.0909
20139272003UA1712509.0909
20138271854UA 315508.7425
20136111517UA1611506.0241
2013615 852UA 215506.0241
20137 71649UA1254506.0241
20137 9 851UA1711506.0241
20137 92001UA 299506.0241
20137 92331UA1178506.0241
.....................
2013610 NAUA 318NA
20136152037UA1611NA
2013617 NAUA 318NA
2013617 NAUA1178NA
2013619 NAUA 430NA
2013619 NAUA 531NA
20136201256UA 636NA
2013622 NAUA 251NA
2013625 NAUA 453NA
20136271951UA1611NA
2013628 NAUA 768NA
20137101905AA1901NA
2013710 NAUA1661NA
2013711 NAUA 508NA
20137161302UA 824NA
2013717 NAUA 226NA
2013722 NAUA1259NA
2013722 NAUA1661NA
2013723 NAUA1454NA
2013728 NAUA 408NA
20138 2 NAUA 349NA
2013813 NAUA 220NA
20138141537UA1178NA
20138161456UA1257NA
20139 2 NAUA1128NA
20139121815UA1695NA
2013912 NAUA 404NA
2013912 NAUA 525NA
2013912 NAUA1128NA
2013920 752UA1591NA

What would happen if we didn’t have the pipe? We could nest each function call inside the previous call:

arrange(
  select(
    mutate(
      filter(
        flights, 
        dest == "IAH"
      ),
      speed = distance / air_time * 60
    ),
    year:day, dep_time, carrier, flight, speed
  ),
  desc(speed)
)
A tibble: 7198 x 7
yearmonthdaydep_timecarrierflightspeed
<int><int><int><int><chr><int><dbl>
20137 9 707UA 226521.7391
20138271850UA1128521.2270
2013828 902UA1711518.5185
20138282122UA1022518.5185
20136111628UA1178515.3374
20138271017UA 333515.3374
20138271205UA1421515.3374
20138271758UA 302515.3374
2013927 521UA 252515.3374
2013828 625UA 559514.9091
20138281505UA1695512.1951
20139 72025UA1022512.1951
20139 81830UA 404512.1951
20138271651UA 525511.8072
20139 4 542UA1714511.8072
20139 62006UA1128511.8072
20139 81849UA1128511.8072
20136112017UA 350509.0909
20137 82131UA 260509.0909
20137 82213UA1740509.0909
20137121718UA 268509.0909
2013719 634UA 226509.0909
20139272003UA1712509.0909
20138271854UA 315508.7425
20136111517UA1611506.0241
2013615 852UA 215506.0241
20137 71649UA1254506.0241
20137 9 851UA1711506.0241
20137 92001UA 299506.0241
20137 92331UA1178506.0241
.....................
2013610 NAUA 318NA
20136152037UA1611NA
2013617 NAUA 318NA
2013617 NAUA1178NA
2013619 NAUA 430NA
2013619 NAUA 531NA
20136201256UA 636NA
2013622 NAUA 251NA
2013625 NAUA 453NA
20136271951UA1611NA
2013628 NAUA 768NA
20137101905AA1901NA
2013710 NAUA1661NA
2013711 NAUA 508NA
20137161302UA 824NA
2013717 NAUA 226NA
2013722 NAUA1259NA
2013722 NAUA1661NA
2013723 NAUA1454NA
2013728 NAUA 408NA
20138 2 NAUA 349NA
2013813 NAUA 220NA
20138141537UA1178NA
20138161456UA1257NA
20139 2 NAUA1128NA
20139121815UA1695NA
2013912 NAUA 404NA
2013912 NAUA 525NA
2013912 NAUA1128NA
2013920 752UA1591NA

Or we could use a bunch of intermediate objects:

flights1 <- filter(flights, dest == "IAH")
flights2 <- mutate(flights1, speed = distance / air_time * 60)
flights3 <- select(flights2, year:day, dep_time, carrier, flight, speed)
arrange(flights3, desc(speed))
A tibble: 7198 x 7
yearmonthdaydep_timecarrierflightspeed
<int><int><int><int><chr><int><dbl>
20137 9 707UA 226521.7391
20138271850UA1128521.2270
2013828 902UA1711518.5185
20138282122UA1022518.5185
20136111628UA1178515.3374
20138271017UA 333515.3374
20138271205UA1421515.3374
20138271758UA 302515.3374
2013927 521UA 252515.3374
2013828 625UA 559514.9091
20138281505UA1695512.1951
20139 72025UA1022512.1951
20139 81830UA 404512.1951
20138271651UA 525511.8072
20139 4 542UA1714511.8072
20139 62006UA1128511.8072
20139 81849UA1128511.8072
20136112017UA 350509.0909
20137 82131UA 260509.0909
20137 82213UA1740509.0909
20137121718UA 268509.0909
2013719 634UA 226509.0909
20139272003UA1712509.0909
20138271854UA 315508.7425
20136111517UA1611506.0241
2013615 852UA 215506.0241
20137 71649UA1254506.0241
20137 9 851UA1711506.0241
20137 92001UA 299506.0241
20137 92331UA1178506.0241
.....................
2013610 NAUA 318NA
20136152037UA1611NA
2013617 NAUA 318NA
2013617 NAUA1178NA
2013619 NAUA 430NA
2013619 NAUA 531NA
20136201256UA 636NA
2013622 NAUA 251NA
2013625 NAUA 453NA
20136271951UA1611NA
2013628 NAUA 768NA
20137101905AA1901NA
2013710 NAUA1661NA
2013711 NAUA 508NA
20137161302UA 824NA
2013717 NAUA 226NA
2013722 NAUA1259NA
2013722 NAUA1661NA
2013723 NAUA1454NA
2013728 NAUA 408NA
20138 2 NAUA 349NA
2013813 NAUA 220NA
20138141537UA1178NA
20138161456UA1257NA
20139 2 NAUA1128NA
20139121815UA1695NA
2013912 NAUA 404NA
2013912 NAUA 525NA
2013912 NAUA1128NA
2013920 752UA1591NA

Groups

group_by()

divide your dataset into groups meaningful for your analysis:

flights |> 
  group_by(month)
A grouped_df: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00

summarize()

The most important grouped operation is a summary, which, if being used to calculate a single summary statistic, reduces the data frame to have a single row for each group.

flights |> 
  group_by(month) |> 
  summarize(
    avg_delay = mean(dep_delay)
  )
A tibble: 12 x 2
monthavg_delay
<int><dbl>
1NA
2NA
3NA
4NA
5NA
6NA
7NA
8NA
9NA
10NA
11NA
12NA
flights |> 
  group_by(month) |> 
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE)
  )
A tibble: 12 x 2
monthavg_delay
<int><dbl>
110.036665
210.816843
313.227076
413.938038
512.986859
620.846332
721.727787
812.611040
9 6.722476
10 6.243988
11 5.435362
1216.576688
flights |> 
  group_by(month) |> 
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE), 
    n = n()
  )
A tibble: 12 x 3
monthavg_delayn
<int><dbl><int>
110.03666527004
210.81684324951
313.22707628834
413.93803828330
512.98685928796
620.84633228243
721.72778729425
812.61104029327
9 6.72247627574
10 6.24398828889
11 5.43536227268
1216.57668828135

The slice_ functions

  • df |> slice_head(n = 1) takes the first row from each group.
  • df |> slice_tail(n = 1) takes the last row in each group.
  • df |> slice_min(x, n = 1) takes the row with the smallest value of column x.
  • df |> slice_max(x, n = 1) takes the row with the largest value of column x.
  • df |> slice_sample(n = 1) takes one random row.

You can vary n to select more than one row, or instead of n =, you can use prop = 0.1 to select (e.g.) 10% of the rows in each group

flights |> 
  group_by(dest) |> 
  slice_max(arr_delay, with_ties = FALSE, n = 1) |>
  relocate(dest)
A grouped_df: 105 x 19
destyearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumoriginair_timedistancehourminutetime_hour
<chr><int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><dbl><dbl><dbl><dbl><dttm>
ABQ2013 72221452007 98 1322259 153B61505N784JBJFK259182620 72013-07-22 20:00:00
ACK2013 7231139 800 2191250 909 221B61491N192JBJFK 44 199 8 02013-07-23 08:00:00
ALB2013 125 1232000 323 2292101 328EV4309N13908EWR 30 14320 02013-01-25 20:00:00
ANC2013 81717401625 7520422003 39UA 887N528UAEWR404337016252013-08-17 16:00:00
ATL2013 7222257 759 898 1211026 895DL2047N6716CLGA109 762 7592013-07-22 07:00:00
AUS2013 71020561505 35123471758 349UA 503N803UAEWR192150415 52013-07-10 15:00:00
AVL2013 8131156 832 20414171029 228EV4175N13538EWR108 583 8322013-08-13 08:00:00
BDL2013 22117281316 25218391413 266EV4103N16976EWR 26 11613162013-02-21 13:00:00
BGR201312 115041056 24816281230 238EV5309N615QXLGA 57 37810562013-12-01 10:00:00
BHM2013 410 251900 325 1362045 291EV5038N713EVLGA115 86619 02013-04-10 19:00:00
BNA2013 12520201527 29323041700 364EV3835N14920EWR124 74815272013-01-25 15:00:00
BOS2013 3 818191200 37920051303 422B61174N329JBEWR 47 20012 02013-03-08 12:00:00
BQN2013 613 62034 212 358 30 208UA1071N77296EWR216158520342013-06-13 20:00:00
BTV2013 91221251459 38622581622 396B61734N178JBJFK 52 26614592013-09-12 14:00:00
BUF201311271503 815 4081628 952 3969E2906N930XJJFK 56 301 8152013-11-27 08:00:00
BUR2013 62721461800 226 1092102 247B6 359N635JBJFK321246518 02013-06-27 18:00:00
BWI2013 1 1 8481835 85310011950 851MQ3944N942MQJFK 41 18418352013-01-01 18:00:00
BZN201312281104 819 16513531119 154UA 568N436UAEWR2671882 8192013-12-28 08:00:00
CAE2013 41421481846 18224002016 224EV4471N15983EWR100 60218462013-04-14 18:00:00
CAK201312181702 950 43218361123 433FL 160N972ATLGA 70 397 9502013-12-18 09:00:00
CHO2013 3 823551940 255 1092121 228EV5325N611QXLGA 42 30519402013-03-08 19:00:00
CHS2013 3 81202 751 2511530 959 331EV4532N15555EWR 88 628 7512013-03-08 07:00:00
CLE2013122222361430 486 151626 469EV4159N15912LGA 71 41914302013-12-22 14:00:00
CLT2013 216 7571930 74710132149 7449E3798N8940EJFK 85 54119302013-02-16 19:00:00
CMH2013 615143219351137160721201127MQ3535N504MQJFK 74 48319352013-06-15 19:00:00
CRW2013 41821581845 19323392030 189MQ4517N713MQLGA 73 44418452013-04-18 18:00:00
CVG2013 722 8451600100510441815 989MQ3075N665MQJFK 96 58916 02013-07-22 16:00:00
DAY201312 51504 955 30917091217 2929E4060N8877ALGA 96 549 9552013-12-05 09:00:00
DCA2013 2271529 845 40416391015 3849E3405N922XJJFK 49 213 8452013-02-27 08:00:00
DEN2013 2102243 830 853 1001106 834F9 835N203FRLGA2331620 8302013-02-10 08:00:00
.........................................................
PIT2013 6251421 805 3761602 950 3729E3611N8458AJFK 64 340 8 52013-06-25 08:00:00
PSE2013 628 3172359 198 652 350 182B6 745N568JBJFK199161723592013-06-28 23:00:00
PSP2013 31610461055 -914121355 17VX 55N839VAJFK365237810552013-03-16 10:00:00
PVD2013 318 2002115 285 2592214 285EV4404N11547EWR 27 16021152013-03-18 21:00:00
PWM2013 3 81300 829 2711505 955 310EV4224N12540EWR 52 284 8292013-03-08 08:00:00
RDU2013 61318331250 34321351425 430MQ3361N804MQJFK 76 42712502013-06-13 12:00:00
RIC2013 722 411638 483 1541811 463EV4300N13970EWR 47 27716382013-07-22 16:00:00
ROC2013 62723341635 419 321805 387EV4322N11191EWR 40 24616352013-06-27 16:00:00
RSW2013 22721381658 280 392007 272B6 139N228JBJFK162107416582013-02-27 16:00:00
SAN2013 627 6151705 790 8532004 769DL 503N372DAJFK312244617 52013-06-27 17:00:00
SAT201312 1 6571930 68710102249 681DL1091N342NWJFK211158719302013-12-01 19:00:00
SAV201312 517351024 43120091246 443EV4495N14977EWR111 70810242013-12-05 10:00:00
SBN2013112220131905 6822242131 53EV5383N398CALGA110 65119 52013-11-22 19:00:00
SDF2013 71021111615 296 571900 3579E3926N800AYJFK110 66216152013-07-10 16:00:00
SEA2013 624 1591735 504 4322108 444DL1543N727TWJFK307242217352013-06-24 17:00:00
SFO2013 920113918451014145722101007AA 177N338AAJFK354258618452013-09-20 18:00:00
SJC2013 71022271855 212 2022216 226B6 669N621JBJFK338256918552013-07-10 18:00:00
SJU2013 71821311455 396 1161905 371AA1635N5FNAAJFK198159814552013-07-18 14:00:00
SLC20131219 7341725 84910462039 847DL1223N375NCEWR290196917252013-12-19 17:00:00
SMF2013 41923461845 301 2482217 271B6 171N706JBJFK335252118452013-04-19 18:00:00
SNA2013 7231034 715 1991307 958 189UA 288N514UAEWR3072434 7152013-07-23 07:00:00
SRQ2013 62417131055 37822051411 474DL1903N345NBLGA154104710552013-06-24 10:00:00
STL2013 627 7531830 803 9372015 802AA2019N571AALGA134 88818302013-06-27 18:00:00
STT2013 2111416 810 36618451315 330AA 655N625AAJFK1801623 8102013-02-11 08:00:00
SYR2013 71022571705 352 1161845 391B61516N238JBJFK 42 20917 52013-07-10 17:00:00
TPA2013 41011001900 96013422211 931DL2391N959DLJFK139100519 02013-04-10 19:00:00
TUL2013 12023471936 251 2302208 262EV4333N14177EWR208121519362013-01-20 19:00:00
TVC2013 8101120 730 2301323 943 220EV5144N718EVLGA 96 655 7302013-08-10 07:00:00
TYS2013 412 512000 291 2502209 281EV3826N15986EWR103 63120 02013-04-12 20:00:00
XNA2013 3 81311 822 28916151045 330EV4140N18556EWR1681131 8222013-03-08 08:00:00

Grouping by multiple variables

daily <- flights |>  
  group_by(year, month, day)
daily
A grouped_df: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
# daily_flights <- daily |> 
#   summarize(n = n())
#> `summarise()` has grouped output by 'year', 'month'. You can override using
#> the `.groups` argument.
daily_flights <- daily |> 
  summarize(
    n = n(), 
    .groups = "drop_last"
  )

Ungrouping

You might also want to remove grouping from a data frame without using summarize(). You can do this with ungroup()

daily |> 
  ungroup()
A tibble: 336776 x 19
yearmonthdaydep_timesched_dep_timedep_delayarr_timesched_arr_timearr_delaycarrierflighttailnumorigindestair_timedistancehourminutetime_hour
<int><int><int><int><int><dbl><int><int><dbl><chr><int><chr><chr><chr><dbl><dbl><dbl><dbl><dttm>
201311517515 2 830 819 11UA1545N14228EWRIAH22714005152013-01-01 05:00:00
201311533529 4 850 830 20UA1714N24211LGAIAH22714165292013-01-01 05:00:00
201311542540 2 923 850 33AA1141N619AAJFKMIA16010895402013-01-01 05:00:00
201311544545-110041022-18B6 725N804JBJFKBQN18315765452013-01-01 05:00:00
201311554600-6 812 837-25DL 461N668DNLGAATL116 7626 02013-01-01 06:00:00
201311554558-4 740 728 12UA1696N39463EWRORD150 7195582013-01-01 05:00:00
201311555600-5 913 854 19B6 507N516JBEWRFLL15810656 02013-01-01 06:00:00
201311557600-3 709 723-14EV5708N829ASLGAIAD 53 2296 02013-01-01 06:00:00
201311557600-3 838 846 -8B6 79N593JBJFKMCO140 9446 02013-01-01 06:00:00
201311558600-2 753 745 8AA 301N3ALAALGAORD138 7336 02013-01-01 06:00:00
201311558600-2 849 851 -2B6 49N793JBJFKPBI14910286 02013-01-01 06:00:00
201311558600-2 853 856 -3B6 71N657JBJFKTPA15810056 02013-01-01 06:00:00
201311558600-2 924 917 7UA 194N29129JFKLAX34524756 02013-01-01 06:00:00
201311558600-2 923 937-14UA1124N53441EWRSFO36125656 02013-01-01 06:00:00
201311559600-1 941 910 31AA 707N3DUAALGADFW25713896 02013-01-01 06:00:00
201311559559 0 702 706 -4B61806N708JBJFKBOS 44 1875592013-01-01 05:00:00
201311559600-1 854 902 -8UA1187N76515EWRLAS33722276 02013-01-01 06:00:00
201311600600 0 851 858 -7B6 371N595JBLGAFLL15210766 02013-01-01 06:00:00
201311600600 0 837 825 12MQ4650N542MQLGAATL134 7626 02013-01-01 06:00:00
201311601600 1 844 850 -6B6 343N644JBEWRPBI14710236 02013-01-01 06:00:00
201311602610-8 812 820 -8DL1919N971DLLGAMSP17010206102013-01-01 06:00:00
201311602605-3 821 805 16MQ4401N730MQLGADTW105 5026 52013-01-01 06:00:00
201311606610-4 858 910-12AA1895N633AAEWRMIA15210856102013-01-01 06:00:00
201311606610-4 837 845 -8DL1743N3739PJFKATL128 7606102013-01-01 06:00:00
201311607607 0 858 915-17UA1077N53442EWRMIA15710856 72013-01-01 06:00:00
201311608600 8 807 735 32MQ3768N9EAMQEWRORD139 7196 02013-01-01 06:00:00
20131161160011 945 931 14UA 303N532UAJFKSFO36625866 02013-01-01 06:00:00
201311613610 3 925 921 4B6 135N635JBJFKRSW17510746102013-01-01 06:00:00
201311615615 010391100-21B6 709N794JBJFKSJU18215986152013-01-01 06:00:00
201311615615 0 833 842 -9DL 575N326NBEWRATL120 7466152013-01-01 06:00:00
.........................................................
201393021232125 -222232247-24EV5489N712EVLGACHO 45 30521252013-09-30 21:00:00
201393021272129 -223142323 -9EV3833N16546EWRCLT 72 52921292013-09-30 21:00:00
201393021282130 -223282359-31B6 97N807JBJFKDEN213162621302013-09-30 21:00:00
201393021292059 3022302232 -2EV5048N751EVLGARIC 45 29220592013-09-30 20:00:00
201393021312140 -922252255-30MQ3621N807MQJFKDCA 36 21321402013-09-30 21:00:00
201393021402140 0 10 40-30AA 185N335AAJFKLAX298247521402013-09-30 21:00:00
201393021422129 1322502239 11EV4509N12957EWRPWM 47 28421292013-09-30 21:00:00
201393021452145 0 115 140-25B61103N633JBJFKSJU192159821452013-09-30 21:00:00
201393021472137 10 30 27 3B61371N627JBLGAFLL139107621372013-09-30 21:00:00
201393021492156 -722452308-23UA 523N813UAEWRBOS 37 20021562013-09-30 21:00:00
201393021502159 -922502306-16EV3842N10575EWRMHT 39 20921592013-09-30 21:00:00
201393021591845194234420301949E3320N906XJJFKBUF 50 30118452013-09-30 18:00:00
201393022032205 -223392331 8EV5311N722EVLGABGR 61 37822 52013-09-30 22:00:00
201393022072140 2722572250 7MQ3660N532MQLGABNA 97 76421402013-09-30 21:00:00
201393022112059 7223392242 57EV4672N12145EWRSTL120 87220592013-09-30 20:00:00
201393022312245-1423352356-21B6 108N193JBJFKPWM 48 27322452013-09-30 22:00:00
201393022332113 80 112 30 42UA 471N578UAEWRSFO318256521132013-09-30 21:00:00
201393022352001154 592249130B61083N804JBJFKMCO123 94420 12013-09-30 20:00:00
201393022372245 -823452353 -8B6 234N318JBJFKBTV 43 26622452013-09-30 22:00:00
201393022402245 -523342351-17B61816N354JBJFKSYR 41 20922452013-09-30 22:00:00
201393022402250-102347 7-20B62002N281JBJFKBUF 52 30122502013-09-30 22:00:00
201393022412246 -52345 1-16B6 486N346JBJFKROC 47 26422462013-09-30 22:00:00
201393023072255 1223592358 1B6 718N565JBJFKBOS 33 18722552013-09-30 22:00:00
201393023492359-10 325 350-25B6 745N516JBJFKPSE196161723592013-09-30 23:00:00
2013930 NA1842 NA NA2019 NAEV5274N740EVLGABNA NA 76418422013-09-30 18:00:00
2013930 NA1455 NA NA1634 NA9E3393NA JFKDCA NA 21314552013-09-30 14:00:00
2013930 NA2200 NA NA2312 NA9E3525NA LGASYR NA 19822 02013-09-30 22:00:00
2013930 NA1210 NA NA1330 NAMQ3461N535MQLGABNA NA 76412102013-09-30 12:00:00
2013930 NA1159 NA NA1344 NAMQ3572N511MQLGACLE NA 41911592013-09-30 11:00:00
2013930 NA 840 NA NA1020 NAMQ3531N839MQLGARDU NA 431 8402013-09-30 08:00:00
daily |> 
  ungroup() |>
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE), 
    flights = n()
  )
A tibble: 1 x 2
avg_delayflights
<dbl><int>
12.63907336776

.by

Instead of using group_by() and ungroup()

flights |> 
  summarize(
    delay = mean(dep_delay, na.rm = TRUE), 
    n = n(),
    .by = c(origin, dest)
  )
A tibble: 224 x 4
origindestdelayn
<chr><chr><dbl><int>
EWRIAH11.826551 3973
LGAIAH 9.058986 2951
JFKMIA 9.336571 3314
JFKBQN 6.665546 599
LGAATL11.44862110263
EWRORD14.644163 6100
EWRFLL13.532606 3793
LGAIAD16.650421 1803
JFKMCO10.601583 5464
LGAORD10.740758 8857
JFKPBI14.420290 1739
JFKTPA12.012855 2987
JFKLAX 8.52250811262
EWRSFO14.326394 5127
LGADFW 5.806151 4858
JFKBOS11.694953 5898
EWRLAS11.230075 2010
LGAFLL12.876453 4008
EWRPBI13.962329 2351
LGAMSP 9.752957 3713
LGADTW 7.770528 5040
EWRMIA11.615385 2633
JFKATL10.465974 1930
JFKSFO11.952691 8204
JFKRSW10.082017 1339
JFKSJU10.334039 4752
EWRATL15.501738 5022
EWRPHX10.635789 2723
LGAMIA 7.361747 5781
EWRMSP15.313163 2377
............
JFKACK 6.4566038265
LGABGR19.4750000375
LGAMSN27.1179245218
LGAORF16.5538462206
JFKIAH15.6176471274
JFKMCI23.0859375276
LGAOMA27.3695652 95
LGADSM17.2222222158
LGAGSP 8.2346939104
JFKABQ13.7401575254
LGAILM19.4166667110
LGASYR19.1107011293
JFKMVY 7.0516432221
LGASBN31.3333333 6
JFKSTL20.0000000 1
LGALEX-9.0000000 1
EWRSBN 5.7500000 4
LGAMHT24.0370370142
LGACAE29.5000000 12
JFKJAC 5.0000000 2
JFKSDF23.9767442 46
LGACHO21.3913043 52
JFKMKE19.8604651183
LGAAVL-2.6000000 10
JFKBHM 7.0000000 1
LGATVC23.4109589 77
LGAMYR 0.6666667 3
EWRTVC17.8695652 24
EWRANC12.8750000 8
EWRLGA NaN 1
# group by multiple variables
flights |> 
  summarize(
    delay = mean(dep_delay, na.rm = TRUE), 
    n = n(),
    .by = c(origin, dest)
  )
A tibble: 224 x 4
origindestdelayn
<chr><chr><dbl><int>
EWRIAH11.826551 3973
LGAIAH 9.058986 2951
JFKMIA 9.336571 3314
JFKBQN 6.665546 599
LGAATL11.44862110263
EWRORD14.644163 6100
EWRFLL13.532606 3793
LGAIAD16.650421 1803
JFKMCO10.601583 5464
LGAORD10.740758 8857
JFKPBI14.420290 1739
JFKTPA12.012855 2987
JFKLAX 8.52250811262
EWRSFO14.326394 5127
LGADFW 5.806151 4858
JFKBOS11.694953 5898
EWRLAS11.230075 2010
LGAFLL12.876453 4008
EWRPBI13.962329 2351
LGAMSP 9.752957 3713
LGADTW 7.770528 5040
EWRMIA11.615385 2633
JFKATL10.465974 1930
JFKSFO11.952691 8204
JFKRSW10.082017 1339
JFKSJU10.334039 4752
EWRATL15.501738 5022
EWRPHX10.635789 2723
LGAMIA 7.361747 5781
EWRMSP15.313163 2377
............
JFKACK 6.4566038265
LGABGR19.4750000375
LGAMSN27.1179245218
LGAORF16.5538462206
JFKIAH15.6176471274
JFKMCI23.0859375276
LGAOMA27.3695652 95
LGADSM17.2222222158
LGAGSP 8.2346939104
JFKABQ13.7401575254
LGAILM19.4166667110
LGASYR19.1107011293
JFKMVY 7.0516432221
LGASBN31.3333333 6
JFKSTL20.0000000 1
LGALEX-9.0000000 1
EWRSBN 5.7500000 4
LGAMHT24.0370370142
LGACAE29.5000000 12
JFKJAC 5.0000000 2
JFKSDF23.9767442 46
LGACHO21.3913043 52
JFKMKE19.8604651183
LGAAVL-2.6000000 10
JFKBHM 7.0000000 1
LGATVC23.4109589 77
LGAMYR 0.6666667 3
EWRTVC17.8695652 24
EWRANC12.8750000 8
EWRLGA NaN 1

Exercises

3.5.1. Which carrier has the worst average delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not? (Hint: think about flights |> group_by(carrier, dest) |> summarize(n()))

3.5.2. Find the flights that are most delayed upon departure from each destination.

3.5.3. How do delays vary over the course of the day. Illustrate your answer with a plot.

3.5.4. What happens if you supply a negative n to slice_min() and friends?

3.5.5. Explain what count() does in terms of the dplyr verbs you just learned. What does the sort argument to count() do?

3.5.6. Suppose we have the following tiny data frame:

df <- tibble(
  x = 1:5,
  y = c("a", "b", "a", "a", "b"),
  z = c("K", "K", "L", "L", "K")
)

a. Write down what you think the output will look like, then check if you were correct, and describe what group_by() does.

df |>
  group_by(y)
A grouped_df: 5 x 3
xyz
<int><chr><chr>
1aK
2bK
3aL
4aL
5bK

b. Write down what you think the output will look like, then check if you were correct, and describe what arrange() does. Also comment on how it’s different from the group_by() in part (a)?

df |>
  arrange(y)
A tibble: 5 x 3
xyz
<int><chr><chr>
1aK
3aL
4aL
2bK
5bK

c. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does.

df |>
  group_by(y) |>
  summarize(mean_x = mean(x))
A tibble: 2 x 2
ymean_x
<chr><dbl>
a2.666667
b3.500000

d. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does. Then, comment on what the message says.

df |>
  group_by(y, z) |>
  summarize(mean_x = mean(x))
`summarise()` has grouped output by 'y'. You can override using the `.groups`
argument.
A grouped_df: 3 x 3
yzmean_x
<chr><chr><dbl>
aK1.0
aL3.5
bK3.5

e. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does. How is the output different from the one in part (d).

df |>
  group_by(y, z) |>
  summarize(mean_x = mean(x), .groups = "drop")
A tibble: 3 x 3
yzmean_x
<chr><chr><dbl>
aK1.0
aL3.5
bK3.5

f. Write down what you think the outputs will look like, then check if you were correct, and describe what each pipeline does. How are the outputs of the two pipelines different?

df |>
  group_by(y, z) |>
  summarize(mean_x = mean(x))
df |>
  group_by(y, z) |>
  mutate(mean_x = mean(x))
`summarise()` has grouped output by 'y'. You can override using the `.groups`
argument.
A grouped_df: 3 x 3
yzmean_x
<chr><chr><dbl>
aK1.0
aL3.5
bK3.5
A grouped_df: 5 x 4
xyzmean_x
<int><chr><chr><dbl>
1aK1.0
2bK3.5
3aL3.5
4aL3.5
5bK3.5

Case study: aggregates and sample size

Whenever you do any aggregation, it’s always a good idea to include a count (n()). That way, you can ensure that you’re not drawing conclusions based on very small amounts of data. We’ll demonstrate this with some baseball data from the Lahman package. Specifically, we will compare what proportion of times a player gets a hit (H) vs. the number of times they try to put the ball in play (AB):

batters <- Lahman::Batting |> 
  group_by(playerID) |> 
  summarize(
    performance = sum(H, na.rm = TRUE) / sum(AB, na.rm = TRUE),
    n = sum(AB, na.rm = TRUE)
  )
batters
A tibble: 20469 x 3
playerIDperformancen
<chr><dbl><int>
aardsda010.00000000 4
aaronha010.3049983812364
aaronto010.22881356 944
aasedo01 0.00000000 5
abadan01 0.09523810 21
abadfe01 0.11111111 9
abadijo010.22448980 49
abbated010.25361367 3044
abbeybe010.16888889 225
abbeych010.28075171 1756
abbotco010.33333333 3
abbotda010.14285714 7
abbotfr010.20857700 513
abbotgl01 NaN 0
abbotje010.26342282 596
abbotji010.09523810 21
abbotku010.25587084 2044
abbotky010.09677419 31
abbotod010.18571429 70
abbotpa010.25000000 20
aberal01 0.14000000 100
abercda010.00000000 4
abercre010.22279793 386
abernbi010.00000000 1
abernbr010.24423963 868
abernte010.20000000 5
abernte020.13812155 181
abernwo010.00000000 8
aberscl010.25139665 179
ablesha010.00000000 26
.........
zimmero010.27551020 98
zimmery010.277427116654
zinkch01 NaN 0
zinkwa01 0.00000000 1
zinnfr01 0.00000000 7
zinngu01 0.269265641103
zinnji01 0.28333333 120
zinsebi01 NaN 0
zinteal010.16666667 78
zipfebu010.22033898 354
ziskri01 0.287130645144
zitoba01 0.10174419 344
zitzmbi010.266932271004
zmiched010.05882353 17
zobribe010.266326535880
zoccope010.10810811 37
zoldasa010.17482517 286
zoskyed010.16000000 50
zuberbi010.13537118 229
zuberjo010.25000000 136
zuberty010.00000000 1
zuletju010.24712644 174
zumayjo01 NaN 0
zuninmi010.200074792674
zupcibo010.25031447 795
zupofr01 0.16666667 18
zuvelpa010.22199593 491
zuverge010.14788732 142
zwilldu010.284375001280
zychto01 NaN 0
batters |> 
  filter(n > 100) |> 
  ggplot(aes(x = n, y = performance)) +
  geom_point(alpha = 1 / 10) + 
  geom_smooth(se = FALSE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

png

batters |> 
  arrange(desc(performance))
A tibble: 20469 x 3
playerIDperformancen
<chr><dbl><int>
abramge0111
alberan0111
banisje0111
bartocl0111
bassdo01 11
birasst0112
bruneju0111
burnscb0111
cammaer0111
campsh01 11
crockcl0111
davenjo0111
davidda0111
devinha0112
dupremi0111
eddydo01 11
egeco01 11
eubanue0111
fryjo01 11
gallaja0111
garcimi0211
garcira0111
gideobr0111
gleasro0111
gowella0111
grasmlo0111
grayda01 11
hancory0111
hesslke0111
holloji0111
.........
yardler01NaN0
yarnaed01NaN0
yeabsbe01NaN0
yettri01 NaN0
ynoami01 NaN0
yochira01NaN0
youngcl01NaN0
youngda01NaN0
youngda02NaN0
youngki01NaN0
youngma03NaN0
youngti01NaN0
yountla01NaN0
zabalan01NaN0
zagurmi01NaN0
zamorda01NaN0
zaratma01NaN0
zaskeje01NaN0
zavadcl01NaN0
zavarcl01NaN0
zelleba01NaN0
zerpaan01NaN0
ziemst01 NaN0
zimmeje02NaN0
zimmejo01NaN0
zimmeky01NaN0
zinkch01 NaN0
zinsebi01NaN0
zumayjo01NaN0
zychto01 NaN0

Summary

In this chapter, you’ve learned the tools that dplyr provides for working with data frames. The tools are roughly grouped into three categories: those that manipulate the rows (like filter() and arrange()), those that manipulate the columns (like select() and mutate()), and those that manipulate groups (like group_by() and summarize()). In this chapter, we’ve focused on these “whole data frame” tools, but you haven’t yet learned much about what you can do with the individual variable.

Share: Twitter Facebook LinkedIn