0

I want to plot one histogram(separate) for each variable in the column. The data is import using a CSV file(sample.csv) and looks like

ip_addr_player_id,  event_name, level, points_earned, stars_earned, moves
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 2, 1000, 2,   2 
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 3, 1000, 2,   2 
203.166.252.219, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,  2     
54.166.252.324, Puzzle Complete, Botany Lab Puzzle 5, 1000, 2,  2

Given each ip_addr_player_id is unique, I want to plot histograms (for each ip_addr_payer_id) for points_earned, starts_earned and moves.

x axis: level; y axis: points_earned/stars_earned/moves(one at a time)

I tried this based on an example I could find online;

 library(readr)
 dataIn <- read.csv("sample.csv")
 #View(dataIn)
 library(ggplot2)
 plot <- ggplot(dataIn, aes(level, points_earned, fill=points_earned))+ 
              geom_histogram() + facet_wrap(~ip_addr_player_id)
 plot

But this code gives me no output.

Shweta Sisodiya
  • 111
  • 1
  • 1
  • 5

2 Answers2

0
    dataIn = read.table(text="
    ip_addr_player_id,  event_name, level, points_earned, stars_earned, moves
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 2, 800, 2,   2 
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
    203.166.252.219, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,  2     
    54.166.252.324, Puzzle Complete, Botany Lab Puzzle 5, 1000, 2,  2
    ",header=T, sep=",")
    dataIn



    # get uniqe players
    players=unique(dataIn$ip_addr_player_id)
    players
    library(data.table)
    #loop over players
    for (i in players) {
      #print (i)

      #select rows for uniq ip_addr_player_id
      index=which(dataIn$ip_addr_player_id ==i)
      #print(index)

      #get dataframe of the coresponding index
      p1=dataIn[index,]

      # get data table
      DT <- data.table(p1)
     #  print(DT)
     # group by level
     dt1= DT[, sum(points_earned), by = level]
      #save the each plot to a file
       png(filename=sprintf("%s.png",i ))
     # set ip as a title for the graph
     barplot(dt1$V1, names.arg=dt1$level, main = i)
     # do the same for other variables for barplot
      dev.off()
    }

Review a partial result online Example

M.Hassan
  • 10,282
  • 5
  • 65
  • 84
  • hey! thanks, @M.Hassan . this works perfectly. I do have a question thou. I have 899 unique player id. But I don't why this loop only considers first player id and execute. #select rows for uniq ip_addr_player_id index=which(dataIn$ip_addr_player_id ==i) this selects rows for i=1 only. – Shweta Sisodiya Jul 09 '17 at 12:33
  • index is a vector that select all rows that match ip_addr_player_id(subset of data for the matched ip) e.g, for i= "118.93.180.241", index =( 1 2 3), so 3 rows are selected not one. try to print index and review the results. – M.Hassan Jul 09 '17 at 12:52
  • Here you can find a partial result as a proof of concept: http://rextester.com/IYYD19716 – M.Hassan Jul 09 '17 at 13:16
  • I mean after it has completed all actions for i=1, the loop should go for i=2 and index should select rows for the corresponding i=2 and so on till i=899.therefore it should plot 899 bar plots. But that's not happening. – Shweta Sisodiya Jul 09 '17 at 13:52
  • if you are working in Rsudio, you find only the last plot. I will modify my code to save the plots to a separate file.Let me know if you find 899 images. – M.Hassan Jul 09 '17 at 13:57
  • I'm a beginner. I started working with R studio a few weeks ago. – Shweta Sisodiya Jul 09 '17 at 14:00
  • Sorry, for troubling you. It's working but it just saves blank png. No plots.here is the link to my code. http://rextester.com/UYSY92726 – Shweta Sisodiya Jul 09 '17 at 14:17
  • For the sample in my code, i get three plots. Be sure the order of statements: `png , barplot and dev.off ` as in my example code. I assume that your data is in the same format as data frame in sample code. – M.Hassan Jul 09 '17 at 14:28
  • Move png(..) statement to be above graph, so, that is why you get blank image :) see modified code: http://rextester.com/RNDJO26130 – M.Hassan Jul 09 '17 at 14:35
  • Still saving only one graph with error "Error in switch(units, `in` = res, cm = res/2.54, mm = res/25.4, px = 1) * : non-numeric argument to binary operator". Should I try some other platform?. Please suggest some. – Shweta Sisodiya Jul 09 '17 at 14:50
  • The following code with sample.csv is working: rextester.com/edit/RNDJO26130 . I add function to save image, and get 3 images. download working example with plots: https://1drv.ms/u/s!AksPHIlLGo1CgRkFpFRqzU9dF0rf – M.Hassan Jul 09 '17 at 15:49
  • I don't what's going wrong. I'm getting just one graph for i=1. So, I check line by line I found that if I run the same program only till "dt1 <- DT[, points_earned, by = level]" then I get DT: 26 obs. of 9 variables; dt1: 26obs. of 9 variables; p1: 26 obs. of 9 variables. The moment you add the syntax for graph and run it gives me DT: 3 obs. of 9 variables; dt1: 3 obs. of 9 variables; p1: 3 obs. of 9 variables. how's that possible? – Shweta Sisodiya Jul 09 '17 at 15:58
  • In the test data you provided I find only 6 variables, so why you get 9 variables? dt1 is a grouping data over level , so you get only 2 variables (level, points_earned) which you plot as bar_chart for certain ip. Then you can add extra code to group what ever variables(e.g stars_earned, moves) and plot their charts. Sure you can combine all these plot using ggplot2. – M.Hassan Jul 10 '17 at 13:32
  • here, I just provided a sample data set. I figured out the problem was with my platform. I restarted it and then the program was working correctly. Thanks ! for help – Shweta Sisodiya Jul 17 '17 at 17:54
-1

You could use loops for example

X = your_dataframe

vector_of_levels_you_want = 1:ncol(your_dataframe)

subset_level_1 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 1"),]
subset_level_2 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 2"),]
subset_level_3 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 3"),]

for (col in vector_of_levels_you_want {
    hist(subset_level_1[,col])
    hist(subset_level_2[,col])
    hist(subset_level_3[,col])
}
meow
  • 2,062
  • 2
  • 17
  • 27
  • Ops !! sorry my bad. I forgot to add that each ip_addr_player can have multiple rows for different level(Botany Lab Puzzle 1 or Botany Lab Puzzle 2 or Botany Lab Puzzle 3 and so on). – Shweta Sisodiya Jul 08 '17 at 16:57
  • Edited code to adjust, this is most likely not the best possible way to do it but I think it should work. Also don't mind too much about the loop, it only becomes a problem once you would have hundreds of thousands of columns in which case R is not really suitable anyways IMO. – meow Jul 08 '17 at 18:18