DataFrames.jl
This section of the documentation will help you understand how to work with SpectralIndices.jl using DataFrames.jl as input.
This tutorial relies on data stored in data. To access it we are going to use the following:
using SpectralIndices, DataFrames
df = load_dataset("spectral", DataFrame)
first(df, 5)| Row | SR_B5 | ST_B10 | SR_B2 | SR_B6 | class | SR_B4 | SR_B7 | SR_B3 | SR_B1 |
|---|---|---|---|---|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | String | Float64 | Float64 | Float64 | Float64 | |
| 1 | 0.269054 | 297.328 | 0.100795 | 0.306206 | Urban | 0.165764 | 0.251949 | 0.132227 | 0.08985 |
| 2 | 0.281264 | 297.108 | 0.08699 | 0.267596 | Urban | 0.160979 | 0.217917 | 0.124404 | 0.0738588 |
| 3 | 0.28422 | 297.436 | 0.0860275 | 0.258384 | Urban | 0.140203 | 0.200098 | 0.120994 | 0.0729375 |
| 4 | 0.254479 | 297.204 | 0.103916 | 0.25958 | Urban | 0.163976 | 0.216735 | 0.135981 | 0.0877325 |
| 5 | 0.269535 | 297.098 | 0.109306 | 0.273234 | Urban | 0.18126 | 0.219554 | 0.15035 | 0.0905925 |
Each column of this dataset is the Surface Reflectance from Landsat 8 for 3 different classes. The samples were taken over Oporto. The data is taken from spyndex and this tutorial is meant to closely mirror the python version.
This dataset specifically contains three different classes:
unique(df[!, "class"])3-element Vector{String}:
"Urban"
"Water"
"Vegetation"so to reflect that we are going to calculate three different indices: NDVI for vegetation, NDWI for water and NDBI for urban.
NDVINDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614NDWINDWI: Normalized Difference Water Index
* Application Domain: water
* Bands/Parameters: Any["G", "N"]
* Formula: (G-N)/(G+N)
* Reference: https://doi.org/10.1080/01431169608948714NDBINDBI: Normalized Difference Built-Up Index
* Application Domain: urban
* Bands/Parameters: Any["S1", "N"]
* Formula: (S1-N)/(S1+N)
* Reference: http://dx.doi.org/10.1080/01431160304987We have multiple ways to feed this data to SectralIndices.jl to generate our indices. We will try to cover most of them here.
From DataFrame to DataFrame
A straightforward way to obtain the calculation of the indices is to feed a DataFrame to compute_index. In order to do this we need first to build the new DataFrame. We can explore which bands we need by calling the bands field in the indices:
NDVI.bands2-element Vector{Any}:
"N"
"R"NDWI.bands2-element Vector{Any}:
"G"
"N"NDBI.bands2-element Vector{Any}:
"S1"
"N"In this case we are going to need only Green, Red, NIR and SWIR1 bands. Since the compute_index expects the bands to have the same name as the have in the bands field we need to select the specific columns that we want out of the dataset and rename them. We can do this easily with select:
params = select(df, :SR_B3=>:G, :SR_B4=>:R, :SR_B5=>:N, :SR_B6=>:S1)
first(params, 5)| Row | G | R | N | S1 |
|---|---|---|---|---|
| Float64 | Float64 | Float64 | Float64 | |
| 1 | 0.132227 | 0.165764 | 0.269054 | 0.306206 |
| 2 | 0.124404 | 0.160979 | 0.281264 | 0.267596 |
| 3 | 0.120994 | 0.140203 | 0.28422 | 0.258384 |
| 4 | 0.135981 | 0.163976 | 0.254479 | 0.25958 |
| 5 | 0.15035 | 0.18126 | 0.269535 | 0.273234 |
Now our dataset is ready, and we just need to call the compute_index function
idx = compute_index(["NDVI", "NDWI", "NDBI"], params)
first(idx, 5)| Row | NDVI | NDWI | NDBI |
|---|---|---|---|
| Float64 | Float64 | Float64 | |
| 1 | 0.237548 | -0.340973 | 0.0645838 |
| 2 | 0.271989 | -0.386671 | -0.0249016 |
| 3 | 0.339326 | -0.402815 | -0.0476153 |
| 4 | 0.216278 | -0.303482 | 0.00992348 |
| 5 | 0.195821 | -0.283852 | 0.0068146 |
The result is a new DataFrame with the desired indices as columns.
Another way to obtain this is to feed single DataFrames as kwargs. First we need to define the single DataFrames:
idx = compute_index(["NDVI", "NDWI", "NDBI"];
G = select(df, :SR_B3=>:G),
N = select(df, :SR_B5=>:N),
R = select(df, :SR_B4=>:R),
S1 = select(df, :SR_B6=>:S1))
first(idx, 5)| Row | NDVI | NDWI | NDBI |
|---|---|---|---|
| Float64 | Float64 | Float64 | |
| 1 | 0.237548 | -0.340973 | 0.0645838 |
| 2 | 0.271989 | -0.386671 | -0.0249016 |
| 3 | 0.339326 | -0.402815 | -0.0476153 |
| 4 | 0.216278 | -0.303482 | 0.00992348 |
| 5 | 0.195821 | -0.283852 | 0.0068146 |
From DataFrame to Vector
Alternatively you can define a Dict for the indices from the DataFrame, going back to an example we saw in the previous page:
params = Dict("G" => df[!, "SR_B3"], "N" => df[!, "SR_B5"], "R" => df[!, "SR_B4"], "S1" => df[!, "SR_B6"])Dict{String, Vector{Float64}} with 4 entries:
"S1" => [0.306206, 0.267596, 0.258384, 0.25958, 0.273234, 0.32954, 0.271721, …
"N" => [0.269054, 0.281264, 0.28422, 0.254479, 0.269535, 0.277153, 0.26563, …
"G" => [0.132227, 0.124404, 0.120994, 0.135981, 0.15035, 0.152303, 0.135885,…
"R" => [0.165764, 0.160979, 0.140203, 0.163976, 0.18126, 0.19754, 0.170026, …The computation is done in the same way:
ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)3-element Vector{Any}:
[0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802 … 0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
[-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846 … -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
[0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355 … -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]Just be careful with the naming, SpectralIndices.jl brings into the namespace all the indices as defined in indices. The all caps version of the indices is reserved for them, as we illustrated at the beginning of this tutorial:
NDVINDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614The two steps can be merged by providing the values directly as kwargs:
ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
G = df[!, "SR_B3"],
N = df[!, "SR_B5"],
R = df[!, "SR_B4"],
S1 = df[!, "SR_B6"])3-element Vector{Any}:
[0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802 … 0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
[-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846 … -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
[0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355 … -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]You are free to choose whichever method you prefer, there is no meaningful trade-off in speed
@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)3-element Vector{Any}:
[0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802 … 0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
[-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846 … -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
[0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355 … -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
G = df[!, "SR_B3"],
N = df[!, "SR_B5"],
R = df[!, "SR_B4"],
S1 = df[!, "SR_B6"])3-element Vector{Any}:
[0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802 … 0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
[-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846 … -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
[0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355 … -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]