1

I have an ascii file that contains one week of data. This data is a text file and does not have header names. I currently have nearly completed a smaller task using R, and have made some attempts with Python as well. Being a pro at neither, its been a steep learning curve. Here is my data/code to paste rows together based on a specific sequence of chr in R that I created and is not working.

Each column holds different data, but the row data is what matters most. for example:

    column 1       column 2     column 3   column 4
Row 1 Name         Age           YR Birth    Date 
Row 2 Middle Name School name    siblings    # of siblings 
Row 3 Last Name     street number  street address
Row 4 Name         Age           YR Birth    Date 
Row 5 Middle Name School name    siblings    # of siblings 
Row 6 Last Name     street number  street address
Row 7 Name         Age           YR Birth    Date 
Row 8 Middle Name School name    siblings    # of siblings 
Row 9 Last Name     street number  street address 

I have a folder to iterate or loop over that some files hold 100's of rows, and others hold 1000's. I have a code written that drops all the rows I don't need, and writes to a new .csv however, any pasting and/or merging isn't producing the desirable results.

What I need is a code to select only the Name and Last name rows (and their adjacent data) from the entire file and paste the last name row beside the end of the name row. Each file has the same amount of columns but different rows.

I have the file to a data frame, and have tried merging/pasting/binding (r and c) the rows/columns, and the result is still just shy of what I need. Rbind works the best thus far, but instead of producing the data with the rows pasted one after another on the same line, they are pasted beside each other in columns like this: ie:

Name Last Name        Name   Last Name     Name    Last Name 
Age   Street Num      Age    Street Num     Age   Street Num
YR    Street address  YR    Street address  YR    Street address
Birth    NA            Birth    NA           Birth    NA
Date     NA            Date     NA           Date     NA

I have tried to rbind them or family[c(Name, Age, YR Birth...)] and I am not successful. I have looked at how many columns I have and tried to add more columns to account for the paste, and instead it populates with the data from row 1.

I'm really at a loss here and if anyone can provide some insight I'd really appreciate it. I'm newer than some, but not as new as others. The results I am achieving look like:

Name Age  YR Birth date Last Name Street Num Street Address NA NA
Name Age  YR Birth date Last Name Street Num Street Address NA NA
Name Age  YR Birth date Last Name Street Num Street Address NA NA

codes tried:

rowData <- rbind(name$Name, name$Age, name$YRBirth, name$Date)

colData <- cbind(name$V1 == "Name", name$V1 == "Last Name")

merge and paste also do not work. I have tried to create each variable as new data frames and am still not achieving the results I am looking for. Does anyone have any insight?

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
Livin2020
  • 11
  • 1

1 Answers1

0

Ok, so if I understand your situation correctly, you want to first slice your data and pull out every third row starting with the 1st row and then pull out every 3rd row starting with the 3rd row. I'd do it like this (assume your data is in df:

df1 <- df[3*(1:(nrow(df)/3)) - 2,]
df2 <- df[3*(1:(nrow(df)/3)),]

once you have these, you can just slap them together, but instead of using rbind you want to use cbind. Then you can drop the NA columns and rename them.

df3 <- cbind(df1,df2)
df3 <- df3[1:7]
colnames(df3) <- c("Name", "Age", "YR", "Birth date", "Last Name", "Street Num", "Street Address")
SirTain
  • 369
  • 2
  • 6
  • Hi thank you. What I need is to pull each row (and associated columns) that match either Name or Last Name, then paste last name after the final row of Name. Sorry, I didn't make that super clear. Do you have thoughts on how to select since the df will have a range of inputs for Name and Last name (one file will have 300 rows another file will have 1000). I feel like the challenge is between selecting every 3rd row and then the range of 1:7. Am I reading that correctly? – Livin2020 Oct 27 '20 at 14:27
  • I may not understand how your data is sturctured, but it won't matter if there are 300 rows or 1000 rows in the data file the way I wrote it. As long as rows 1, 4, 7, etc. has Name and rows 3, 6, 9, etc. has Last Name then the function I wrote will just count through them and strip them out. Then `cbind` binds the columns of the first data frame to the columns of the second data frame. `df3` in the example has all the columns of `df1` next to all the columns of `df2`. Then, you can just rename them. – SirTain Oct 27 '20 at 15:00
  • Thank you. There are occasions where rows 1, 4, 7 etc has name, but on other occasions it will be 1, 4, 7, 8 (potentially all in the same file). That's why I thought I had to parse it out via the name instead of row placement. – Livin2020 Oct 27 '20 at 15:24
  • unfortuneatley the code did not successfully pull the correct files. It continued to pull them all and place them in a single row not bind it beside the start row of "names" – Livin2020 Oct 27 '20 at 15:43
  • I think, then, your data might be a little less structured than I can effectively help you without access to the data itself (whch I assume you can't share in its raw format because of privacy concerns). If you have access to Microsoft Office and Excel in particular, you might be able to solve your data import problems there with its built in functions. You could then save it in a different format (such as a CSV or an Excel file) that is easier to load into and manipulate in R. – SirTain Oct 29 '20 at 11:49
  • Continued: In Excel, you can go to the Data tab and then on the left is a button "From Text". Open your file there, and the Text Import Wizard will let you pick Delimiters on the second step. It will also prieview what the import will look like as you go. Once you pick the appropriate delimeter to separate it into a rows and columns the way you want, you can hit finish and it will import all the data. This could help with your data import issues where it's putting everything in a single row. – SirTain Oct 29 '20 at 11:55
  • Thank you for this, but converting in Excel is not ideal (hundreds of files so to do this manually would be too time consuming). The code isn't writing the data so that one row populates with the data into new columns (populated by data in row 1 ie: Name, middle, Last to all appear on one line with each variable into a new column. -> so that each "Name, Middle, Last" starts on a new row.. does that make sense? – Livin2020 Oct 29 '20 at 13:22
  • No, I'm sorry. It doesn't. – SirTain Oct 29 '20 at 14:53