I apologize if this seems simple, but I have spent an absurd amount of time chasing Google searches in circles and I can't seem to find an answer to this problem...
I have a series of excel files (of .csv type) that I need to import. They are composed of data consisting of 9 columns separated by |. Looks something like this:
1 CustID|Animal Adopted|Description|Year|Adoption date|dob|Fee|Paid
2 123|Horse|I got a Horse|2011|2011-01-31|1991-03-01|8.00|NO
3 0101|Horse|I got a Horse|2006|2006-01-31|1987-21-01|50.00|NO
4 1929|Fish|I adopted a fish and for some reason this requires a very long description |2001|2001-12-31|1991-10-01|5.00|YES
5 199|Fish|I adopted a fish and for some reason this requires a very long description |2001|2001-12-31|1991-10-01|5.00|YES
6 0101|Dog|another really long description for adopting a dog because they are awesome
7 |2003|2003-05-11|1999-04-01|50.00|(null)
8 0102|Dog|another really long description for adopting a dog because they are awesome
9 |2003|2003-05-11|1989-04-01|51.00|(null)
10 9330|Cat|Mr.Kitty is a kitty who is so fat and cute and furry|2010|2010-10-11|1986-10-20|10000.00|YES
11 1933|Hamster|Doing a dance in tube city.
12 |2011|2011-12-05|2005-06-01|4.00|(null)
Thus when imported into R it does this
CustID Animal.Adopted Description Year Adoption.date dob Fee Paid..
1 123 Horse I got a Horse 2011 2011-01-31 1991-03-01 8 NO,,
2 101 Horse I got a Horse 2006 2006-01-31 1987-21-01 50 NO,,
3 1929 Fish I adopted a fish,and for some reason,this requires a very long description 2001 2001-12-31 1991-10-01 5 YES
4 199 Fish I adopted a fish,and for some reason,this requires a very long description 2001 2001-12-31 1991-10-01 5 YES
5 101 Dog another really long description,for adopting a dog because they are awesome, NA
6 NA 2003 2003-05-11 1999-04-01 50.00 (null),, NA
7 102 Dog another really long description,for adopting a dog because they are awesome, NA
8 NA 2003 2003-05-11 1989-04-01 51.00 (null),, NA
9 9330 Cat Mr.Kitty is a kitty who is so fat,and cute,and furry 2010 2010-10-11 1986-10-20 10000 YES
10 1933 Hamster Doing a dance,in tube city., NA
11 NA 2011 2011-12-05 2005-06-01 4.00 (null),, NA
>
This severely impacts the accuracy of my results later when I try to categorize my data. Lines 5 & 6, 7 & 8, and 10 & 11 represent the problem, as half of the column data is on the first row and the other half is in the second.
I have found this question @Layla from a couple years ago regarding reading in text files line by line and am attempting to blend the accepted answer with a personalized for loop. However, I don't understand why the while loop condition is TRUE...
I was wondering if there was a way to code it so that every time r encounters the row with only three columns, it bumps up the next row to fill in the last two columns. But I would only want that to occur with perfectly matching data because I may be missing random names or locations throughout the data as I don't want to mismatch the entire data set as previously mentioned.
I don't know how to instruct the loop to look out for the missing data. I tried something like if line(ncol = 3) {Merge()}
but lets be honest, that makes no sense...
I've tried readLines()
... but that is still reading everything in the exact same way so I figure this needs to be combine with other code. And all of my Google searches just turn up the same basic "readLines reads the connection in one line at a time" and then that's it.
I feel like I have the pieces but I still can't figure out how to put them together.
As a caveat, I am quite new to this. Even a helpful article that explains how to write a routine that reads in code manually would be helpful. There is so much information on how to read in files using read.csv, read.tsv, read.xlsx
, etc. but very very little on how to handle reading in data manually.