0

I have tbl_df from csv file whit different length of columns, but all dataframe displaying as same length, where no data is NA. For example, in first column data only until row 120, but in second column data until 200, so dim(df) is 200, and missing data for first column is na.

How I can see real length of each column? I trying dim, length, 'nrow' specific for each column but without success its always display max size of column.

zx8754
  • 52,746
  • 12
  • 114
  • 209
TeoK
  • 511
  • 6
  • 13

1 Answers1

1

You can try

colSums(!is.na(df))
Leonardo
  • 2,439
  • 33
  • 17
  • 31
  • but it did across all df? `for (i in 1:length(train_set)){ print(colSums(!is.na(train_set[i])))}` - looks not good – TeoK Nov 03 '21 at 09:04
  • 1
    @TeoK no need for "forloop", !is.na works on full dataframe, returns logical matrix, then colSums, sums for each column. – zx8754 Nov 03 '21 at 09:06
  • I don't know what `train_set` is. If `train_set` is a dataframe, the `for` loop is not needed: `colSum(!is.na(df))` returns the length of each column of the dataframe excluding the `NA` – Leonardo Nov 03 '21 at 09:06
  • `train_set` is `df`, my apologies, but in console it's look not well, each column have a very long name, and count of column 40+. Can I do some table/pivot table with dpyr? – TeoK Nov 03 '21 at 09:11
  • i think it is not necessary to use `pivot_longer`. Maybe just doing `as.data.frame(colSums (!is.na(df)))` – Leonardo Nov 03 '21 at 09:15
  • @TeoK See this [dpyr answer](https://stackoverflow.com/a/50357736/680068) from the linked post. – zx8754 Nov 03 '21 at 09:19