6

I am given a data set that looks something like this

data

and I am trying to graph all the points with a 1 on the first column separate from the points with a 0, but I want to put them in the same chart.

I know the final result should be something similar to this enter image description here

But I can't find a way to filter the points in Julia. I'm using LinearAlgebra, CSV, Plots, DataFrames for my project, and so far I haven't found a way to make DataFrames storage types work nicely with Plots functions. I keep running into errors like Cannot convert Float64 to series data for plotting when I try plotting the points individually with a for loop as a filter as shown in the code below

filter = select(data, :1)
newData = select(data, 2:3)

#graph one initial point to create the plot
plot(newData[1,1], newData[1,2], seriestype = :scatter, title = "My Scatter Plot")

#add the additional points with the 1 in front
for i in 2:size(newData)
    if filter[i] == 1
        plot!(newData[i, 1], newData[i, 2], seriestype = :scatter, title = "My Scatter Plot")
    end
end

Other approaches have given me other errors, but I haven't recorded those.

I'm using Julia 1.4.0 and the latest versions of all of the packages mentioned.

Quick Edit:

It might help to know that I am trying to replicate the Nonlinear dimensionality reduction section of this article https://sebastianraschka.com/Articles/2014_kernel_pca.html#principal-component-analysis

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • Why not just (optional: sort the table by the first column, and) plot 2nd and 3rd column as x-y, with the color of the dot depending on the first column? – Kasey Chang May 07 '20 at 06:20

1 Answers1

6

With Plots.jl you can do the following (I am passing a fully reproducible code):

julia> df = DataFrame(c=rand(Bool, 100), x = 2 .* rand(100) .- 1);

julia> df.y = ifelse.(df.c, 1, -1) .* df.x .^ 2;

julia> plot(df.x, df.y, color=ifelse.(df.c, "blue", "red"), seriestype=:scatter, legend=nothing)

However, in this case I would additionally use StatsPlots.jl as then you can just write:

julia> using StatsPlots

julia> @df df plot(:x, :y, group=:c, seriestype=:scatter, legend=nothing)

If you want to do it manually by groups it is easiest to use the groupby function:

julia> gdf = groupby(df, :c);

julia> summary(gdf) # check that we have 2 groups in data
"GroupedDataFrame with 2 groups based on key: c"

julia> plot(gdf[1].x, gdf[1].y, seriestype=:scatter, legend=nothing)

julia> plot!(gdf[2].x, gdf[2].y, seriestype=:scatter)

Note that gdf variable is bound to a GroupedDataFrame object from which you can get groups defined by the grouping column (:c) in this case.

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107
  • What if my data contains no headers? How does one reference a column in DataFrames without a header? – KeyboardHunter May 07 '20 at 15:15
  • Can you tell me two things: 1) which version of DataFrames.jl you are on, 2) what does `names(df)` print? – Bogumił Kamiński May 07 '20 at 16:03
  • I'm in the latest version of DataFrames.jl (I reinstalled it yesterday because it was acting buggy) and this is that `names(df)` prints `3-element Array{Symbol,1}: :Column1 :Column2 :Column3` – KeyboardHunter May 07 '20 at 17:27
  • If it prints you a `Vector{Symbol}` this means you are not on the latest version of DataFrames.jl (latest returns a `Vector{String}`). Anyway - your data frame has column names as you can see - they are `:Column1`, `:Column2` and `:Column3` and you can use these names to access specific columns. – Bogumił Kamiński May 07 '20 at 17:43