1

I have two data sets which I'd like to compare in one graph (Ethereum price and transaction volume). I plotted a graph but I think it is sth wrong with the scale of the y-axis:

ETH_price <- read.table(file = '~/R/export-EtherPrice.csv' , header = T, sep=";")

transaction_volume <- read.csv(file = '~/R/export-TxGrowth.csv', header = T, sep=";")

head(ETH_price)

head(transaction_volume)

ETH_price$Date.UTC. <- as.Date(ETH_price$Date.UTC., format = "%m/%d/%Y")

str(ETH_price) # verify the date format

transaction_volume$Date.UTC. <- as.Date(transaction_volume$Date.UTC., format = "%m/%d/%Y") 

str(transaction_volume) # verify the date format

ggplot(ETH_price,aes(x = Date.UTC.,y = Value)) + 
  geom_point()+
  geom_line(aes(color="ETH_price")) +
  geom_line(data=transaction_volume,aes(x = Date.UTC.,y = Value, color="transaction_volume")) +
  labs(color="Legend") +
  scale_colour_manual("", breaks = c("ETH_price", "transaction_volume"),
                      values = c("blue", "brown")) +
  ggtitle("Correlation of ETH price and transaction volume") + 
  theme(plot.title = element_text(lineheight=.7, face="bold"))

The following error occurs:

Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

The data looks like this (ETH_price):

> head(transaction_volume)

   Date.UTC. UnixTimeStamp Value
1 03.03.2017    1488499200 64294
2 04.03.2017    1488585600 58756
3 05.03.2017    1488672000 57031
4 06.03.2017    1488758400 57020
5 07.03.2017    1488844800 62589
6 08.03.2017    1488931200 55386

The plot looks like this:

new_wrong_plot_edited

Does someone have an idea what could be wrong?

I'm happy about every hint!:)

MAiniak

/Code updated

MAiniak
  • 29
  • 6
  • Hi, in this part `geom_line(data=transaction_volume,aes(color="transaction_volume"))`, shall you need to pass x and y argument in `aes` in order to geom_line know what to plot ? Or these arguments are the same than in ETH price dataset ? (Not sure if it is related) – dc37 Nov 08 '19 at 00:19
  • Hey thx fo the fast reply! The data of the x and y argument is in both cases "Date.UTC." and "Value" - should I repeat it for transaction_volume? I thought it is sufficient to just define it in the beginning? I get an error saying: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic? – MAiniak Nov 08 '19 at 00:31
  • I would try to use `ylim(...)` with different values according to the range of your variable `Value`. – Zhiqiang Wang Nov 08 '19 at 00:36
  • Thanks, but to be honest I have no clue how to implement ylim with the two data sets I mentioned! There more than 3000 objects in the dataset. – MAiniak Nov 08 '19 at 00:45
  • When you reset the data source, you need to specify the x and y aesthetics. You could first combine your data frames into a single frame, then ggplot will know what to look for. Or specify x and y for the second line. You'll get better help if you can post examples of bot the data sets, particularly in a format that lets people easily read into their system. – Brian Fisher Nov 08 '19 at 01:00

2 Answers2

1

To summarize all critical steps to solve your question.

1) You have to manipulate the date format in order to be correctly plot by ggplot.

2) As your ETH_price value and transaction_volume values are not on the same scale, in order to plot them on a single graph, you have to use the trick described by @r2evans in this post: two y-axes with different scales for two datasets in ggplot2 [duplicate].

So, your code should look like something like that:

# Here I re-created a small part of your dataset here just for the example
Date.UTC. = c("03.03.2017","04.03.2017","05.03.2017","06.03.2017","07.03.2017","08.03.2017")
Value = c(64294,58756,57031,57020,62589,55386)
transaction_volume = data.frame(Date.UTC.,Value)

Value = c(19.54,19.45,20.45,22.67,23.34,21.89)
ETH_price = data.frame(Date.UTC.,Value)

# Managing Date format
ETH_price$Date.UTC. = as.Date(ETH_price$Date.UTC., format = "%m.%d.%Y")
transaction_volume$Date.UTC. = as.Date(transaction_volume$Date.UTC., format = "%m.%d.%Y")
str(ETH_price) # to check the correct format of your dataset
str(transaction_volume) # to check the correct format of your dataset

# Merging dataset
ETH_price$z = "ETH_price"
transaction_volume$z = "transaction_volume"

# Defining the scale factor (you can adapt this part according your preferences for plotting)
scale_factor = mean(transaction_volume$Value / ETH_price$Value)
df_temp = within(transaction_volume, {Value = Value / scale_factor})
df = rbind(ETH_price,df_temp)
df

# Plotting both datasets
library(ggplot2)
mycolors = c("ETH_price" = "blue", "transaction_volume" = "red")
ggplot(df, aes(x = Date.UTC., y = Value, group = z, color = z)) +
  geom_path() +
  geom_line() +
  scale_y_continuous(name = "ETH_price", sec.axis = sec_axis(~scale_factor*., name = "transaction_volume")) +
  scale_color_manual(name = "Datasets", values = mycolors) +
  theme(
    axis.title.y = element_text(color = mycolors["ETH_price"]),
    axis.text.y = element_text(color = mycolors["ETH_price"]),
    axis.title.y.right = element_text(color = mycolors["transaction_volume"]),
    axis.text.y.right = element_text(color = mycolors["transaction_volume"])
  )

And so, you should get the following plot: enter image description here

So, I think it should solve your question ;)

dc37
  • 15,840
  • 4
  • 15
  • 32
0

Thanks for your replies!

I checked the dataset and there were a few corrupted lines which I threw out. Now I have a very basic problem (sorry just getting started with R), the data in excel looks like this: Excel_data

If I get it back to the first column, the date is gone because the column does not have the date format, instead, there is a somewhat random number. I just had datasets which had all data in the first column that I imported to R. I'd try the original code with the new data that currently looks like this in R:

    > head(transaction_volume)

   Date.UTC. UnixTimeStamp Value
1 03.03.2017    1488499200 64294
2 04.03.2017    1488585600 58756
3 05.03.2017    1488672000 57031
4 06.03.2017    1488758400 57020
5 07.03.2017    1488844800 62589
6 08.03.2017    1488931200 55386

How can I read in the data so R will recognize in the same way it did when the data was in the first column of the .csv?

Sorry for the hassle.

MAiniak
  • 29
  • 6
  • From Excel, did you export your data as a csv file ? – dc37 Nov 08 '19 at 01:26
  • yes, I read it in as csv file: `ETH_price <- read.csv('~/R/export-EtherPrice.csv')` – MAiniak Nov 08 '19 at 01:30
  • maybe you can try: `ETH_price <- read.table(file = '~/R/export-EtherPrice.csv' , header = T, sep=";")`. – dc37 Nov 08 '19 at 01:33
  • great, that worked! I tried a slightly different version before. unfortunately now I still get the error of group aesthetic - I'm not sure where to include your date format adjustment in the code of my original post, any idea? – MAiniak Nov 08 '19 at 01:40
  • Can you edit your first post to reflect your slight different version ? Then, I will be able to edit my code to show you where to include the date format adjustement – dc37 Nov 08 '19 at 01:41
  • I updated to the exact code I currently try to run. – MAiniak Nov 08 '19 at 01:43
  • thank you for your effort, I really appreciate it! I get a error `in seq.int(0, to0 - from, by) : 'to' must be a finite number` - could it has sth to do with missing packages? – MAiniak Nov 08 '19 at 01:53
  • Add the error output to your first post in order people can follow up on your code without reading the whole page ;) Also, please report the output of `head` and `str` commands of your code (just to check the right format of your data) – dc37 Nov 08 '19 at 01:56
  • Sorry, I just realized that the new format of your date are `%m.%d.%Y` instead of `%m/%d/%Y`. I corrected the part of the code corresponding to your data into my first answer. Try with it – dc37 Nov 08 '19 at 02:07
  • I added the outcomes of `str` and `head`, it seems that the date format is not recognized anymore - your edit fixed that! We have some progress regarding the plot, it shows a graph for transaction volume, nothing for ETH price though. I updated the plot picture in my original post! The y-axis scale seems to be wrong? – MAiniak Nov 08 '19 at 02:07
  • Your plot is correct. Just ETH price and transaction volume are not in the same range at all. You need to plot it at different scales. – dc37 Nov 08 '19 at 02:20
  • That makes sense, sorry it's late in Scotland!:) would you suggest to plot it individually and just put it next to each other? or is there an easy (I really doubt that) solution to show it in one plot? – MAiniak Nov 08 '19 at 02:22
  • it's late for me too. You can check on this post: [two y-axes with different scales for two datasets in ggplot2](https://stackoverflow.com/questions/49185583/two-y-axes-with-different-scales-for-two-datasets-in-ggplot2). If you don't figure out tomorrow, I will try to help you with that ;) – dc37 Nov 08 '19 at 02:28
  • thanks for helping me out, I can utilise the current state for sure! – MAiniak Nov 08 '19 at 02:29
  • Hi, I provide the full code you will need to both adjust the date format of your datasets and be able to plot them in a single graph. If you are satisfied by this answer, please delete your second post (regarding the formatting of the csv files) in order it does not look confusing for other people. – dc37 Nov 08 '19 at 07:05