Data Manipulation
Joins
#Base.join — Function.
join(strings, delim, [last])
Join an array of strings into a single string, inserting the given delimiter between adjacent strings. If last is given, it will be used instead of delim between the last two strings. For example
join(["apples", "bananas", "pineapples"], ", ", " and ") == "apples, bananas and pineapples"
strings can be any iterable over elements x which are convertible to strings via print(io::IOBuffer, x).
Join two DataFrames
join(df1::AbstractDataFrame, df2::AbstractDataFrame; on::Union{Symbol, Vector{Symbol}} = Symbol[], kind::Symbol = :inner)
Arguments
df1,df2: the two AbstractDataFrames to be joined
Keyword Arguments
-
on: a Symbol or Vector{Symbol}, the column(s) used as keys when joining; required argument except forkind = :cross -
kind: the type of join, options include: -
:inner: only include rows with keys that match in bothdf1anddf2, the default :outer: include all rows fromdf1anddf2:left: include all rows fromdf1:right: include all rows fromdf2:semi: return rows ofdf1that match with the keys indf2:anti: return rows ofdf1that do not match with the keys indf2:cross: a full Cartesian product of the key combinations; every row ofdf1is matched with every row ofdf2
NAs are filled in where needed to complete joins.
Result
::DataFrame: the joined DataFrame
Examples
name = DataFrame(ID = [1, 2, 3], Name = ["John Doe", "Jane Doe", "Joe Blogs"]) job = DataFrame(ID = [1, 2, 4], Job = ["Lawyer", "Doctor", "Farmer"]) join(name, job, on = :ID) join(name, job, on = :ID, kind = :outer) join(name, job, on = :ID, kind = :left) join(name, job, on = :ID, kind = :right) join(name, job, on = :ID, kind = :semi) join(name, job, on = :ID, kind = :anti) join(name, job, kind = :cross)
Reshaping
#DataFrames.melt — Function.
Stacks a DataFrame; convert from a wide to long format; see stack.
#DataFrames.stack — Function.
Stacks a DataFrame; convert from a wide to long format
stack(df::AbstractDataFrame, measure_vars, id_vars) stack(df::AbstractDataFrame, measure_vars) stack(df::AbstractDataFrame) melt(df::AbstractDataFrame, id_vars, measure_vars) melt(df::AbstractDataFrame, id_vars)
Arguments
-
df: the AbstractDataFrame to be stacked -
measure_vars: the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; formelt, defaults to all variables that are notid_vars -
id_vars: the identifier columns that are repeated during stacking, a normal column indexing type; forstackdefaults to all variables that are notmeasure_vars
If neither measure_vars or id_vars are given, measure_vars defaults to all floating point columns.
Result
::DataFrame: the long-format dataframe with column:valueholding the values of the stacked columns (measure_vars), with column:variablea Vector of Symbols with themeasure_varsname, and with columns for each of theid_vars.
See also stackdf and meltdf for stacking methods that return a view into the original DataFrame. See unstack for converting from long to wide format.
Examples
d1 = DataFrame(a = repeat([1:3;], inner = [4]), b = repeat([1:4;], inner = [3]), c = randn(12), d = randn(12), e = map(string, 'a':'l')) d1s = stack(d1, [:c, :d]) d1s2 = stack(d1, [:c, :d], [:a]) d1m = melt(d1, [:a, :b, :e])
#DataFrames.unstack — Function.
Unstacks a DataFrame; convert from a long to wide format
unstack(df::AbstractDataFrame, rowkey, colkey, value) unstack(df::AbstractDataFrame, colkey, value) unstack(df::AbstractDataFrame)
Arguments
-
df: the AbstractDataFrame to be unstacked -
rowkey: the column with a unique key for each row, if not given, find a key by grouping on anything not acolkeyorvalue -
colkey: the column holding the column names in wide format, defaults to:variable -
value: the value column, defaults to:value
Result
::DataFrame: the wide-format dataframe
Examples
wide = DataFrame(id = 1:12, a = repeat([1:3;], inner = [4]), b = repeat([1:4;], inner = [3]), c = randn(12), d = randn(12)) long = stack(wide) wide0 = unstack(long) wide1 = unstack(long, :variable, :value) wide2 = unstack(long, :id, :variable, :value)
Note that there are some differences between the widened results above.
#DataFrames.stackdf — Function.
A stacked view of a DataFrame (long format)
Like stack and melt, but a view is returned rather than data copies.
stackdf(df::AbstractDataFrame, measure_vars, id_vars) stackdf(df::AbstractDataFrame, measure_vars) meltdf(df::AbstractDataFrame, id_vars, measure_vars) meltdf(df::AbstractDataFrame, id_vars)
Arguments
-
df: the wide AbstractDataFrame -
measure_vars: the columns to be stacked (the measurement variables), a normal column indexing type, like a Symbol, Vector{Symbol}, Int, etc.; formelt, defaults to all variables that are notid_vars -
id_vars: the identifier columns that are repeated during stacking, a normal column indexing type; forstackdefaults to all variables that are notmeasure_vars
Result
::DataFrame: the long-format dataframe with column:valueholding the values of the stacked columns (measure_vars), with column:variablea Vector of Symbols with themeasure_varsname, and with columns for each of theid_vars.
The result is a view because the columns are special AbstractVectors that return indexed views into the original DataFrame.
Examples
d1 = DataFrame(a = repeat([1:3;], inner = [4]), b = repeat([1:4;], inner = [3]), c = randn(12), d = randn(12), e = map(string, 'a':'l')) d1s = stackdf(d1, [:c, :d]) d1s2 = stackdf(d1, [:c, :d], [:a]) d1m = meltdf(d1, [:a, :b, :e])
#DataFrames.meltdf — Function.
A stacked view of a DataFrame (long format); see stackdf