1

I want to convert the data frame below (dataset) into a xyz format, getting the longitude (x variable, the first 36 values in each column of the first row), latitude (y variable, the first 30 values in each row of the first column) and the current value of each x-y ordered pair (z variable). Is it possible to do it using r?

The data comprises spatially distributed rainfall values (most values are equal or close to zero). Longitude ranges from -47.8 to -36.2, and latitude ranges from -21.2 to -7.2.

> dataset <- read.csv("rainfall_data.csv", header = FALSE, sep = ",")
> headTail(dataset)

    V1    V2    V3  V4    V5    V6    V7    V8  V9   V10   V11   V12   V13 V14   V15   V16   V17   V18 V19   V20
1    <NA> -47.8 -47.4 -47 -46.6 -46.2 -45.8 -45.4 -45 -44.6 -44.2 -43.8 -43.4 -43 -42.6 -42.2 -41.8 -41.4 -41 -40.6
2   -21.2     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
3   -20.8     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
4   -20.4     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
...   ...   ...   ... ...   ...   ...   ...   ... ...   ...   ...   ...   ... ...   ...   ...   ...   ... ...   ...
34   -8.4     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
35     -8     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
36   -7.6     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
37   -7.2     0     0   0     0     0     0     0   0     0     0     0     0   0     0     0     0     0   0     0
      V21   V22   V23 V24   V25   V26   V27   V28 V29   V30   V31
1   -40.2 -39.8 -39.4 -39 -38.6 -38.2 -37.8 -37.4 -37 -36.6 -36.2
2       0     0     0   0     0     0     0     0   0     0     0
3       0     0     0   0     0     0     0     0   0     0     0
4       0     0     0   0     0     0     0     0   0     0     0
...   ...   ...   ... ...   ...   ...   ...   ... ...   ...   ...
34      0     0     0   0     0     0     0     0   0     0     0
35      0     0     0   0     0     0     0     0   0     0     0
36      0     0     0   0     0     0     0     0   0     0     0
37      0     0     0   0     0     0     0     0   0     0     0

Possible outcome:

     x     y     z
1    -47.8 -21.1 0
2    -47.4 -21.1 0
3    -47.0 -21.1 0
4    -46.6 -21.1 0
...   ...  ...  ...
1077 -37.4 -7.2  0
1078 -37.0 -7.2  0
1079 -36.6 -7.2  0
1080 -36.2 -7.2  0

Data are available here: https://drive.google.com/file/d/1IQqoV1WsrEXQoorZmuE4DEA8wzJkI3GL/view?usp=sharing

Anyone
  • 47
  • 1
  • 5

2 Answers2

0

With base R, we can create a new dataframe without the first column and first row. Then, we can assign those as the row and column names (excluding the NA). Then, I convert to matrix in order to convert to a table, then into a data frame with three columns.

result <- df[-1, -1]
row.names(result) <- df[-1,1]
colnames(result) <- df[1,-1]

as.data.frame(as.table(as.matrix(result)))

Working from your .csv, we can use read.delim to specify the first column as rownames and the first row as the header, and we add check.names = FALSE, so that we return the correct numbers as the header (if you put TRUE, then it appends a X. to the beginning of the name).

df <- read.delim(file = "~/Downloads/rainfall_data.csv", header = TRUE, row.names = 1, sep = ",", check.names=FALSE)

as.data.frame(as.table(as.matrix(df)))

Another option would be to convert the V1 to row names and row 1 to the column names using a combination of tibble::column_to_rownames and janitor::row_to_names. Then, follow the same steps as above with base R.

library(tidyverse)
library(janitor)

results <- df %>%
  tibble::column_to_rownames("V1") %>%
  janitor::row_to_names(1) %>%
  as.matrix() %>%
  as.table() %>%
  as.data.frame()

Output

 head(results)

   Var1  Var2 Freq
1 -21.2 -47.8    0
2 -20.8 -47.8    0
3 -20.4 -47.8    0
4 -21.2 -47.4    0
5 -20.8 -47.4    0
6 -20.4 -47.4    0

Data

df <- structure(list(V1 = c("<NA>", "-21.2", "-20.8", "-20.4"), V2 = c(-47.8, 
0, 0, 0), V3 = c(-47.4, 0, 0, 0), V4 = c(-47L, 0L, 0L, 0L), V5 = c(-46.6, 
0, 0, 0), V6 = c(-46.2, 0, 0, 0), V7 = c(-45.8, 0, 0, 0), V8 = c(-45.4, 
0, 0, 0), V9 = c(-45L, 0L, 0L, 0L), V10 = c(-44.6, 0, 0, 0), 
    V11 = c(-44.2, 0, 0, 0), V12 = c(-43.8, 0, 0, 0), V13 = c(-43.4, 
    0, 0, 0), V14 = c(-43L, 0L, 0L, 0L), V15 = c(-42.6, 0, 0, 
    0), V16 = c(-42.2, 0, 0, 0), V17 = c(-41.8, 0, 0, 0), V18 = c(-41.4, 
    0, 0, 0), V19 = c(-41L, 0L, 0L, 0L), V20 = c(-40.6, 0, 0, 
    0)), class = "data.frame", row.names = c(NA, -4L))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • This solution did not work. The following error appears when I tried to create the "results" object: "Error in ``.rowNamesDF<-`(x, value = value): missing values in 'row.names' are not allowed`. A NA value show up when a import the dataset to R. I updated the question with a link to download the original data. – Anyone Mar 08 '22 at 00:27
  • 1
    @Anyone I'm not sure, as I wasn't able to replicate it (even just using base R). Maybe a difference in version or something. But I did add in making the packages explicit, as maybe that was causing the issue. Regardless, I updated my answer to have a cleaner way in base R, which should work. – AndrewGB Mar 08 '22 at 03:40
  • 1
    @Anyone I also just added in a different way to read in the data that will allow you to skip having to move the first column and first row. – AndrewGB Mar 08 '22 at 03:58
0

Please check this.

rainfall_file <- "path/to/csv/file"
dataset <- read.csv(rainfall_file)
library(tidyverse)
results <- dataset %>%
    pivot_longer(cols = !X,names_to = "Variable", values_to = "Count", names_prefix = "X.", names_transform = list(Variable = as.double), values_drop_na = TRUE)
Eva
  • 663
  • 5
  • 13
  • If you want the new columns named Y and Z instead, you can change the code results <- dataset %>% pivot_longer(cols = !X,names_to = "Y", values_to = "Z", names_prefix = "X.", names_transform = list(Y = as.double), values_drop_na = TRUE) – Eva Mar 08 '22 at 01:04