0

edit: Questions related to leading 0's don't help me, because 1) I need to add 0's in the middle of string 2) In each string, I need to add a different number of 0's (and I "don't know" this number, for now, I know what to do to find how many 0's ) and 3) I need to automate this because of the file size.

I have a really big file that contains three columns with id's. The file look like this (in R it's read as data frame):

   ABCDEFG005504315643 ABCDEFG001504336101 ABCDEFG005392630156
   ABCDEFG005328208783 ABCDEFG000360175030 ABCDEFG005347352265
   ABCDEFG000117796830 ABCDEFG003145429820 ABCDEFG000330889848

All id should be 19 characters long, but some of them are shorter. I have to correct them so that they are 19 characters long by adding 0's after 7th character (after letters).

The problem is I don't know how many 0's I need to add in each case because some id's have 17 characters but some can have 15 and so on.

I know in which rows I have shorter id's (I used which() and nchar() functions to find it) and I have an idea of how to find how many 0 I need to add (19 - nachar()), but there are two issues:

  1. how to use this to correct id's if I only know from which character 0's should be inserted (I don't know precisely between which and which character because of different lengths, so I only know "start")
  2. how to do this for a really big data frame - maybe something from apply family?

Thank you for all the help!

jakim_M
  • 21
  • 7
  • Letters have the same nchar for every ID? – zx8754 Apr 07 '22 at 10:20
  • Yes, I always have 7 letters at the begining. – jakim_M Apr 07 '22 at 10:21
  • 1
    Great, then substr first 7 letters into x1, then substr 8 to nchar(id) into x2, then use linked post to prefix n number of 0s. Finally, paste0(x1, x2). – zx8754 Apr 07 '22 at 10:24
  • But how to do it for all the cases. I have shorter id's in all three columns and in many rows. Something from apply? And finally, I need to modify the original file in this way. – jakim_M Apr 07 '22 at 10:28
  • Step by step, first make a function that takes an ID, and returns clean ID. Then loop through columns [apply that function](https://stackoverflow.com/questions/18503177/r-apply-function-on-specific-dataframe-columns): result <- data.frame(lapply(mydata, myfunction)), finally write.table(result, ...). – zx8754 Apr 07 '22 at 10:31
  • Thanks, your guidance helped me a lot. The post is closed so I can't accept your answers, so thanks! – jakim_M Apr 07 '22 at 11:36

0 Answers0