Importing and Exporting (I/O)

Importing data from tabular data files

To read data from a CSV-like file, use the readtable function:

#DataFrames.readtableFunction.


Read data from a tabular-file format (CSV, TSV, ...)

readtable(filename;
          header::Bool = true,
          separator::Char = getseparator(pathname),
          quotemark::Vector{Char} = ['"'],
          decimal::Char = '.',
          nastrings::Vector = ASCIIString["", "NA"],
          truestrings::Vector = ASCIIString["T", "t", "TRUE", "true"],
          falsestrings::Vector = ASCIIString["F", "f", "FALSE", "false"],
          makefactors::Bool = false,
          nrows::Integer = -1,
          names::Vector = Symbol[],
          eltypes::Vector{DataType} = DataType[],
          allowcomments::Bool = false,
          commentmark::Char = '#',
          ignorepadding::Bool = true,
          skipstart::Integer = 0,
          skiprows::AbstractVector{Int} = Int[],
          skipblanks::Bool = true,
          encoding::Symbol = :utf8,
          allowescapes::Bool = false,
          normalizenames::Bool = true)

Arguments

  • filename : the filename to be read

Keyword Arguments

  • header::Bool – Use the information from the file's header line to determine column names. Defaults to true.
  • separator::Char – Assume that fields are split by the separator character. If not specified, it will be guessed from the filename: .csv defaults to ',', .tsv defaults to ' ', .wsv defaults to ' '.
  • quotemark::Vector{Char} – Assume that fields contained inside of two quotemark characters are quoted, which disables processing of separators and linebreaks. Set to Char[] to disable this feature and slightly improve performance. Defaults to ['"'].
  • decimal::Char – Assume that the decimal place in numbers is written using the decimal character. Defaults to '.'.
  • nastrings::Vector{ASCIIString} – Translate any of the strings into this vector into an NA. Defaults to ["", "NA"].
  • truestrings::Vector{ASCIIString} – Translate any of the strings into this vector into a Boolean true. Defaults to ["T", "t", "TRUE", "true"].
  • falsestrings::Vector{ASCIIString} – Translate any of the strings into this vector into a Boolean false. Defaults to ["F", "f", "FALSE", "false"].
  • makefactors::Bool – Convert string columns into PooledDataVector's for use as factors. Defaults to false.
  • nrows::Int – Read only nrows from the file. Defaults to -1, which indicates that the entire file should be read.
  • names::Vector{Symbol} – Use the values in this array as the names for all columns instead of or in lieu of the names in the file's header. Defaults to [], which indicates that the header should be used if present or that numeric names should be invented if there is no header.
  • eltypes::Vector{DataType} – Specify the types of all columns. Defaults to [].
  • allowcomments::Bool – Ignore all text inside comments. Defaults to false.
  • commentmark::Char – Specify the character that starts comments. Defaults to '#'.
  • ignorepadding::Bool – Ignore all whitespace on left and right sides of a field. Defaults to true.
  • skipstart::Int – Specify the number of initial rows to skip. Defaults to 0.
  • skiprows::Vector{Int} – Specify the indices of lines in the input to ignore. Defaults to [].
  • skipblanks::Bool – Skip any blank lines in input. Defaults to true.
  • encoding::Symbol – Specify the file's encoding as either :utf8 or :latin1. Defaults to :utf8.
  • normalizenames::Bool – Ensure that column names are valid Julia identifiers. For instance this renames a column named "a b" to "a_b" which can then be accessed with :a_b instead of symbol("a b"). Defaults to true.

Result

  • ::DataFrame

Examples

df = readtable("data.csv")
df = readtable("data.tsv")
df = readtable("data.wsv")
df = readtable("data.txt", separator = '    ')
df = readtable("data.txt", header = false)

Exporting data to a tabular data file

To write data to a CSV file, use the writetable function:

#DataFrames.writetableFunction.


Write data to a tabular-file format (CSV, TSV, ...)

writetable(filename::AbstractString,
           df::AbstractDataFrame;
           header::Bool = true,
           separator::Char = getseparator(filename),
           quotemark::Char = '"',
           nastring::AbstractString = "NA",
           append::Bool = false)

Arguments

  • filename : the filename to be created
  • df : the AbstractDataFrame to be written

Keyword Arguments

  • separator::Char – The separator character that you would like to use. Defaults to the output of getseparator(filename), which uses commas for files that end in .csv, tabs for files that end in .tsv and a single space for files that end in .wsv.
  • quotemark::Char – The character used to delimit string fields. Defaults to '"'.
  • header::Bool – Should the file contain a header that specifies the column names from df. Defaults to true.
  • nastring::AbstractString – What to write in place of missing data. Defaults to "NA".

Result

  • ::DataFrame

Examples

df = DataFrame(A = 1:10)
writetable("output.csv", df)
writetable("output.dat", df, separator = ',', header = false)
writetable("output.dat", df, quotemark = ''', separator = ',')
writetable("output.dat", df, header = false)