vefce.blogg.se - Tidyverse summary

(although you'll still get a concatenation of the column name and the renamed aggregation function) with 2 more variables: Petal.Length_mean, Petal.Width_mean ĮDIT: also, you can give summarise_all a named list of arguments using the funs wrapper, if you'd rather avoid the pain of renaming all those aggregated columns (although you'll still get a concatenation of the column name and the renamed aggregation function)

#> Species Petal.Length_min Petal.Width_min Petal.Length_max Petal.Width_max Select_at(vars("Species", starts_with("Petal"))) %>% But since the summarise_* functions aggregate the output, you can't have any other columns left anyway-so you could just use select_* to drop unwanted columns before hand.įor example, you could select the grouping variable(s) and columns of interest with select_at, then group_by the grouping variable(s), then summarise_all: iris %>% Summarise_all operates on all columns except the grouping ones, so you don't get the control of using select helpers like vars(matches("blah")). # Note that output variable name now includes the function name, in order to

with 4 more variables: Sepal.Length_max, Sepal.Width_max , #> Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min

For example: by_species %>% summarise_all(funs(min, max)) Some of the examples in the scoped summarise docs use summarise_all to apply multiple functions to multiple columns. Is the workaround a good way to go or am I in danger of getting into some bad habits?Īre there other tidyverse approaches recommended in this situation? But I'm not sure if the workaround is necessary and I've missed an easy step somewhere. This seems OK to me and was the approach I suggested as an answer to a recent question. report % select(Sepal.Length:Petal.Width) %>% imap_dfr(report) Each remaining column relates to an arbitrary summary function. The first column returned is the original tibble column name. My current workaround is to ditch summarise_at() completely and define a function which returns a one row tibble. Is there another tidyverse way I should do this? One downfall of this approach is the logical result for anyNA() is now coerced to numeric. This is where I wonder if I'm heading in the wrong direction. Summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA)) %>% Perhaps I should use gather and spread to get the desired output: iris %>% Unfortunately, the above result isn't tidy. #> Sepal.Length_anyNA Sepal.Width_anyNA Petal.Length_anyNA #> Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min I can then extend the previous example to summarise multiple columns: iris %>% summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA)) Iris %>% summarise_at("Petal.Width", funs(min, anyNA)) using min() and anyNA()): library(tidyverse) In addition, the results should be contained in a 'tidy' tibble.įor example, I can summarise one column multiple ways (e.g.

83.1 31.Hi tidyverse community, I am wondering if there is a recommended tidyverse workflow when you want to summarise multiple columns in a tibble using multiple arbitrary summary functions. x, na.rm = TRUE ) ) ) #> # A tibble: 10 × 4 #> homeworld height mass birth_year #> #> 1 Alderaan 176. Starwars %>% summarise ( across ( where ( is.character ), n_distinct ) ) #> # A tibble: 1 × 8 #> name hair_color skin_color eye_color sex gender homeworld species #> #> 1 87 13 31 15 5 3 49 38 starwars %>% group_by ( species ) %>% filter ( n ( ) > 1 ) %>% summarise ( across ( c ( sex, gender, homeworld ), n_distinct ) ) #> # A tibble: 9 × 4 #> species sex gender homeworld #> #> 1 Droid 1 2 3 #> 2 Gungan 1 1 1 #> 3 Human 2 2 16 #> 4 Kaminoan 2 2 1 #> # ℹ 5 more rows starwars %>% group_by ( homeworld ) %>% filter ( n ( ) > 1 ) %>% summarise ( across ( where ( is.numeric ), ~ mean (.