It comes as a single binary, we just have to download and unpack:
wget https://github.com/duckdb/duckdb/releases/download/v0.9.2/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64.zip
./duckdb
D select tx_id, item_price from read_csv('./data/compras_tiny.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
┌────────────┬────────────┐
│ tx_id │ item_price │
│ varchar │ double │
├────────────┼────────────┤
│ YMEVOKU194 │ 28.85 │
│ YMEVOKU194 │ 4.12 │
└────────────┴────────────┘
./duckdb -c "select * from read_csv('data/compras_tiny.csv', AUTO_DETECT=TRUE)"
Just append the filename when running duckdb
. For instance, this will create a table named compras
in a file at data/compras_tiny.duckdb
:
duckdb data/compras_tiny.duckdb -c "CREATE TABLE compras AS SELECT * FROM read_csv('data/compras_tiny.csv', header=true, auto_detect=true)"
After that, you can open the file and query the compras
table:
./duckdb data/compras_tiny.duckdb
v0.9.2 3c695d7ba9
Enter ".help" for usage hints.
D describe compras;
┌────────────────────┬─────────────┬─────────┬─────────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ int32 │
├────────────────────┼─────────────┼─────────┼─────────┼─────────┼───────┤
│ tx_id │ VARCHAR │ YES │ │ │ │
│ tx_time │ TIMESTAMP │ YES │ │ │ │
│ buyer │ VARCHAR │ YES │ │ │ │
│ currency_code │ VARCHAR │ YES │ │ │ │
│ payment_type │ VARCHAR │ YES │ │ │ │
│ credit_card_number │ VARCHAR │ YES │ │ │ │
│ country │ VARCHAR │ YES │ │ │ │
│ department │ VARCHAR │ YES │ │ │ │
│ product │ VARCHAR │ YES │ │ │ │
│ item_price │ DOUBLE │ YES │ │ │ │
│ coupon_code │ VARCHAR │ YES │ │ │ │
│ was_returned │ VARCHAR │ YES │ │ │ │
├────────────────────┴─────────────┴─────────┴─────────┴─────────┴───────┤
│ 12 rows 6 columns │
└────────────────────────────────────────────────────────────────────────┘
D .mode line
D select count(*) as con_descuento from compras where coupon_code is not null;
con_descuento = 414
Use the .mode csv
, also -csv
or -json
command line options. You can check the available output formats in the documentation.
D .mode csv
D select tx_id, item_price from read_csv('./data/compras_tiny.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';
tx_id,item_price
YMEVOKU194,28.85
YMEVOKU194,4.12
Remember you can always save the data to a different file:
./duckdb -jsonlines -c "select tx_id, item_price from read_csv('./data/compras_tiny.csv', AUTO_DETECT=TRUE) where tx_id = 'YMEVOKU194';" > ./data/compras_tiny.jsonl
Use >>
instead of >
to append to the file.
For instance, using compras_tiny.csv
:
./duckdb -c "copy (select * from read_csv('data/compras_tiny.csv', AUTO_DETECT=TRUE)) to 'data/compras_tiny.parquet' (format 'parquet')"
You need first to install the library:
pip install duckdb
Use the method .sql
to run SQL queries as you'll normally do in the console:
df = duckdb.sql("select * from read_csv('./data/prices.csv', AUTO_DETECT=TRUE)")
Which by default will print something like this in the console or Jupyter Lab:
┌─────────┬────────────┬────────────────────┬───┬───────────────────┬───────────────────┬─────────────┐
│ Symbol │ Date │ Adj Close │ … │ Low │ Open │ Volume │
│ varchar │ date │ double │ │ double │ double │ double │
├─────────┼────────────┼────────────────────┼───┼───────────────────┼───────────────────┼─────────────┤
│ aapl │ 2010-01-04 │ 6.535084247589111 │ … │ 7.585000038146973 │ 7.622499942779541 │ 493729600.0 │
│ msft │ 2010-01-04 │ 23.800220489501953 │ … │ 30.59000015258789 │ 30.6200008392334 │ 38409100.0 │
│ goog │ 2010-01-04 │ 312.20477294921875 │ … │ 310.9544677734375 │ 312.3044128417969 │ 3927065.0 │
├─────────┴────────────┴────────────────────┴───┴───────────────────┴───────────────────┴─────────────┤
│ 3 rows 8 columns (6 shown) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
You can use the method .df()
to convert any result into a Pandas Dataframe:
df = duckdb.sql("select * from read_csv('./data/prices.csv', AUTO_DETECT=TRUE)").df()
df[df['Symbol'] == 'msft']
df.set_index(["Symbol", "Date"])
All Dataframes you create will be directly accesible from DuckDB as well:
import pandas as pd
goog = pd.read_csv("./data/goog.csv")
duckdb.sql("select * from goog")